用户名: 密码: 验证码:
基于SoC的卷积神经网络系统设计
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Design of convolutional neural network system based on SoC
  • 作者:李子聪 ; 曾宇航 ; 熊晓明
  • 英文作者:Li Zicong;Zeng Yuhang;Xiong Xiaoming;School of Automation, Guangdong University of Technology;Chipeye Microelectronics foshan Ltd.;
  • 关键词:SoC ; 卷积神经网络 ; 并行化 ; 软硬件协同设计
  • 英文关键词:SoC;;convolutional neural network;;deserialize;;hardware software co-designed
  • 中文刊名:DZCL
  • 英文刊名:Electronic Measurement Technology
  • 机构:广东工业大学自动化学院;佛山芯珠微电子有限公司;
  • 出版日期:2019-05-23
  • 出版单位:电子测量技术
  • 年:2019
  • 期:v.42;No.318
  • 基金:广东省科技项目(2017B090909004)资助
  • 语种:中文;
  • 页:DZCL201910022
  • 页数:6
  • CN:10
  • ISSN:11-2175/TN
  • 分类号:132-137
摘要
近些年,卷积神经网络(CNN)出色地完成了许多机器视觉任务。但现有的软件实施方案无法很好地在便携式设备中实现,为此设计一种基于Xilinx全可编程SoC的CNN系统,在固定资源的SoC平台下,只需较少资源即可实现快速的检测系统。系统实现多级流水线和输入数据复用的方法提高计算效率。系统硬件部分实现CNN计算,软件实现图片预处理及图片检测后处理,从而提高运行效率,系统可实现多种卷核的卷积操作,平均值池化,非极大值抑制抑制算法,实现图片中多人脸的准确定位。实验结果表明,在100 MHz的工作频率下,系统的平均计算速率为0.19 Gops/s,功耗仅为通用CPU的4.07%。
        In recent years, convolutional neural networks have done a great job in many machine vision tasks. However, existing software implementations are not well implemented in portable devices. A convolutional neural network system based on Xilinx all-programmable SoC is designed to accelerate the convolutional operation in parallel, which only need few design resource and implement fast detection system. The system uses multi-stage pipeline technology and input data reuse to improve calculation efficiency. The hardware part completes convolutional network calculation, and the software part finish the image preprocessing and post-image detection preprocessing, thereby improving operation efficiency. The system can implements the convolution operation with different size, mean pooling operation and the non-maximum suppression algorithm, which achieves accurate positioning of multiple faces in the picture. The experimental results show that the average calculation rate of the system is 0.19 Gops/s at the operating frequency of 100 MHz,and the power consumption is only 4.07% of the general purpose CPU.
引文
[1] 黄荷,俞亚萍,张之江.基于神经网络的密集人群视频异常检测[J].电子测量技术,2017,40(11):103-107.
    [2] 崔雪红,刘云,王传旭,等.基于卷积神经网络的轮胎缺陷X光图像分类[J].电子测量技术,2017,40(5):168-173.
    [3] 李伟,张旭东.基于卷积神经网络的深度图像超分辨率重建方法[J].电子测量与仪器学报,2017,31(12):1918-1928.
    [4] 余子健,马德,严晓浪,等.基于FPGA的卷积神经网络加速器[J].计算机工程,2017,43(1):109-114,119.
    [5] 余子健.基于FPGA的卷积神经网络加速器[D].浙江:浙江大学,2016.
    [6] 王羽.基于FPGA的卷积神经网络应用研究[D].广州:华南理工大学,2016.
    [7] 李嘉辉,蔡述庭,陈学松,等.基于FPGA的卷积神经网络的实现[J].自动化与信息工程,2018,39(1):32-37.
    [8] 王小雪.基于FPGA的卷积神经网络手写数字识别系统的实现[D].北京:北京理工大学,2016.
    [9] 鲁云涛.基于FPGA的稀疏神经网络加速器[D].合肥:中国科学技术大学,2018.
    [10] 王思阳.基于FPGA的卷积神经网络加速器设计[D].成都:电子科技大学,2017.
    [11] 周华坤.基于NOC结构的卷积神经网络加速器建模[D].西安:西安理工大学,2018.
    [12] 杨薇.卷积神经网络的FPGA并行结构研究[J].数字技术与应用,2015,(12):51.
    [13] 陆志坚.基于FPGA的卷积神经网络并行结构研究[D].哈尔滨:哈尔滨工程大学,2013.
    [14] CHEN Y H,KRISHNA T,EMER J S,et al.Eyeriss:an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J].IEEE Journal of Solid-State Circuits,2017,52(1):127-138.
    [15] TU F,YIN S,OUYANG P,et al.Deep convolutional neural network architecture with reconfigurable computation patterns[J].IEEE Transactions on Very Large Scale Integration Systems,2017,25(8):2220-2233.
    [16] CHEN Y H,EMER J,SZE V.Eyeriss:a spatial architecture for energy-efficient dataflow for convolutional neural networks[J].IEEE Micro,2016,PP(99):1-1.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700