摘要
近些年,卷积神经网络(CNN)出色地完成了许多机器视觉任务。但现有的软件实施方案无法很好地在便携式设备中实现,为此设计一种基于Xilinx全可编程SoC的CNN系统,在固定资源的SoC平台下,只需较少资源即可实现快速的检测系统。系统实现多级流水线和输入数据复用的方法提高计算效率。系统硬件部分实现CNN计算,软件实现图片预处理及图片检测后处理,从而提高运行效率,系统可实现多种卷核的卷积操作,平均值池化,非极大值抑制抑制算法,实现图片中多人脸的准确定位。实验结果表明,在100 MHz的工作频率下,系统的平均计算速率为0.19 Gops/s,功耗仅为通用CPU的4.07%。
In recent years, convolutional neural networks have done a great job in many machine vision tasks. However, existing software implementations are not well implemented in portable devices. A convolutional neural network system based on Xilinx all-programmable SoC is designed to accelerate the convolutional operation in parallel, which only need few design resource and implement fast detection system. The system uses multi-stage pipeline technology and input data reuse to improve calculation efficiency. The hardware part completes convolutional network calculation, and the software part finish the image preprocessing and post-image detection preprocessing, thereby improving operation efficiency. The system can implements the convolution operation with different size, mean pooling operation and the non-maximum suppression algorithm, which achieves accurate positioning of multiple faces in the picture. The experimental results show that the average calculation rate of the system is 0.19 Gops/s at the operating frequency of 100 MHz,and the power consumption is only 4.07% of the general purpose CPU.
引文
[1] 黄荷,俞亚萍,张之江.基于神经网络的密集人群视频异常检测[J].电子测量技术,2017,40(11):103-107.
[2] 崔雪红,刘云,王传旭,等.基于卷积神经网络的轮胎缺陷X光图像分类[J].电子测量技术,2017,40(5):168-173.
[3] 李伟,张旭东.基于卷积神经网络的深度图像超分辨率重建方法[J].电子测量与仪器学报,2017,31(12):1918-1928.
[4] 余子健,马德,严晓浪,等.基于FPGA的卷积神经网络加速器[J].计算机工程,2017,43(1):109-114,119.
[5] 余子健.基于FPGA的卷积神经网络加速器[D].浙江:浙江大学,2016.
[6] 王羽.基于FPGA的卷积神经网络应用研究[D].广州:华南理工大学,2016.
[7] 李嘉辉,蔡述庭,陈学松,等.基于FPGA的卷积神经网络的实现[J].自动化与信息工程,2018,39(1):32-37.
[8] 王小雪.基于FPGA的卷积神经网络手写数字识别系统的实现[D].北京:北京理工大学,2016.
[9] 鲁云涛.基于FPGA的稀疏神经网络加速器[D].合肥:中国科学技术大学,2018.
[10] 王思阳.基于FPGA的卷积神经网络加速器设计[D].成都:电子科技大学,2017.
[11] 周华坤.基于NOC结构的卷积神经网络加速器建模[D].西安:西安理工大学,2018.
[12] 杨薇.卷积神经网络的FPGA并行结构研究[J].数字技术与应用,2015,(12):51.
[13] 陆志坚.基于FPGA的卷积神经网络并行结构研究[D].哈尔滨:哈尔滨工程大学,2013.
[14] CHEN Y H,KRISHNA T,EMER J S,et al.Eyeriss:an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J].IEEE Journal of Solid-State Circuits,2017,52(1):127-138.
[15] TU F,YIN S,OUYANG P,et al.Deep convolutional neural network architecture with reconfigurable computation patterns[J].IEEE Transactions on Very Large Scale Integration Systems,2017,25(8):2220-2233.
[16] CHEN Y H,EMER J,SZE V.Eyeriss:a spatial architecture for energy-efficient dataflow for convolutional neural networks[J].IEEE Micro,2016,PP(99):1-1.