基于SoC的卷积神经网络系统设计

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于SoC的卷积神经网络系统设计

详细信息查看全文 | 推荐本文 |

英文篇名：Design of convolutional neural network system based on SoC
作者：李子聪 ; 曾宇航 ; 熊晓明
英文作者：Li Zicong;Zeng Yuhang;Xiong Xiaoming;School of Automation, Guangdong University of Technology;Chipeye Microelectronics foshan Ltd.;
关键词：SoC ; 卷积神经网络 ; 并行化 ; 软硬件协同设计
英文关键词：SoC;;convolutional neural network;;deserialize;;hardware software co-designed
中文刊名：DZCL
英文刊名：Electronic Measurement Technology
机构：广东工业大学自动化学院;佛山芯珠微电子有限公司;
出版日期：2019-05-23
出版单位：电子测量技术
年：2019
期：v.42;No.318
基金：广东省科技项目(2017B090909004)资助
语种：中文;
页：DZCL201910022
页数：6
CN：10
ISSN：11-2175/TN
分类号：132-137

摘要

近些年,卷积神经网络(CNN)出色地完成了许多机器视觉任务。但现有的软件实施方案无法很好地在便携式设备中实现,为此设计一种基于Xilinx全可编程SoC的CNN系统,在固定资源的SoC平台下,只需较少资源即可实现快速的检测系统。系统实现多级流水线和输入数据复用的方法提高计算效率。系统硬件部分实现CNN计算,软件实现图片预处理及图片检测后处理,从而提高运行效率,系统可实现多种卷核的卷积操作,平均值池化,非极大值抑制抑制算法,实现图片中多人脸的准确定位。实验结果表明,在100 MHz的工作频率下,系统的平均计算速率为0.19 Gops/s,功耗仅为通用CPU的4.07%。
In recent years, convolutional neural networks have done a great job in many machine vision tasks. However, existing software implementations are not well implemented in portable devices. A convolutional neural network system based on Xilinx all-programmable SoC is designed to accelerate the convolutional operation in parallel, which only need few design resource and implement fast detection system. The system uses multi-stage pipeline technology and input data reuse to improve calculation efficiency. The hardware part completes convolutional network calculation, and the software part finish the image preprocessing and post-image detection preprocessing, thereby improving operation efficiency. The system can implements the convolution operation with different size, mean pooling operation and the non-maximum suppression algorithm, which achieves accurate positioning of multiple faces in the picture. The experimental results show that the average calculation rate of the system is 0.19 Gops/s at the operating frequency of 100 MHz,and the power consumption is only 4.07% of the general purpose CPU.

引文

[1] 黄荷,俞亚萍,张之江.基于神经网络的密集人群视频异常检测[J].电子测量技术,2017,40(11):103-107.
    [2] 崔雪红,刘云,王传旭,等.基于卷积神经网络的轮胎缺陷X光图像分类[J].电子测量技术,2017,40(5):168-173.
    [3] 李伟,张旭东.基于卷积神经网络的深度图像超分辨率重建方法[J].电子测量与仪器学报,2017,31(12):1918-1928.
    [4] 余子健,马德,严晓浪,等.基于FPGA的卷积神经网络加速器[J].计算机工程,2017,43(1):109-114,119.
    [5] 余子健.基于FPGA的卷积神经网络加速器[D].浙江:浙江大学,2016.
    [6] 王羽.基于FPGA的卷积神经网络应用研究[D].广州:华南理工大学,2016.
    [7] 李嘉辉,蔡述庭,陈学松,等.基于FPGA的卷积神经网络的实现[J].自动化与信息工程,2018,39(1):32-37.
    [8] 王小雪.基于FPGA的卷积神经网络手写数字识别系统的实现[D].北京:北京理工大学,2016.
    [9] 鲁云涛.基于FPGA的稀疏神经网络加速器[D].合肥:中国科学技术大学,2018.
    [10] 王思阳.基于FPGA的卷积神经网络加速器设计[D].成都:电子科技大学,2017.
    [11] 周华坤.基于NOC结构的卷积神经网络加速器建模[D].西安:西安理工大学,2018.
    [12] 杨薇.卷积神经网络的FPGA并行结构研究[J].数字技术与应用,2015,(12):51.
    [13] 陆志坚.基于FPGA的卷积神经网络并行结构研究[D].哈尔滨:哈尔滨工程大学,2013.
    [14] CHEN Y H,KRISHNA T,EMER J S,et al.Eyeriss:an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J].IEEE Journal of Solid-State Circuits,2017,52(1):127-138.
    [15] TU F,YIN S,OUYANG P,et al.Deep convolutional neural network architecture with reconfigurable computation patterns[J].IEEE Transactions on Very Large Scale Integration Systems,2017,25(8):2220-2233.
    [16] CHEN Y H,EMER J,SZE V.Eyeriss:a spatial architecture for energy-efficient dataflow for convolutional neural networks[J].IEEE Micro,2016,PP(99):1-1.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700