深度神经网络内部迁移的信息几何度量分析

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

深度神经网络内部迁移的信息几何度量分析

详细信息查看全文 | 推荐本文 |

英文篇名：Analysis on Information Geometric Measurement of Internal Transfer of Deep Neural Network
作者：陈力 ; 费洪晓 ; 李海峰 ; 何嘉宝 ; 谭风云
英文作者：CHEN Li;FEI Hongxiao;LI Haifeng;HE Jiabao;TAN Fengyun;School of Geosciences and Info-Physics,Central South University;School of Software Engineering,Central South University;
关键词：深度学习 ; 迁移学习 ; 信息几何
英文关键词：deep learning;;transfer learning;;information geometry
中文刊名：HNDX
英文刊名：Journal of Hunan University(Natural Sciences)
机构：中南大学地球科学与信息物理学院;中南大学软件学院;
出版日期：2019-02-25
出版单位：湖南大学学报(自然科学版)
年：2019
期：v.46;No.302
基金：国家自然科学基金资助项目(61602525,61603404,41571397,41501442)~~
语种：中文;
页：HNDX201902014
页数：8
CN：02
ISSN：43-1061/N
分类号：102-109

摘要

使用深度神经网络处理计算机视觉问题时,在新任务数据量较少情况下,往往会采用已在大数据集上训练好的模型权值作为新任务的初始权值进行训练,这种训练方式最终得到的模型泛化能力更好.对此现象,传统解释大多只是基于直觉分析而缺少合理的数学推导.本文将深度神经网络这种网络结构不变下层间的学习转为深度神经网络内部的迁移能力,并将学习过程变化形式化到数学表达式.考虑数据集对训练过程带来的影响,利用信息几何分析方法,确定不同数据集流形之上的度量和联络,实现不同数据集之间的嵌入映射,同时将参数空间的变化也放入流形空间,探究其对学习过程的共同影响,最终实现对这种内部迁移现象的数学解释.经过分析和实验验证可得内部迁移过程其实是一种能使网络可以在更广空间进行最优搜索的变化,有利于模型可以在学习过程中获得相对的更优解.
When deep learning is used to deal with the computer vision tasks, under little number of new task data, the pre-trained model weight based on a very large data is trained as an initial weight to get better generaliza-tion ability. At this point, former explanations are based on the intuitive analysis and lack of reasonable mathematical methods. In this paper, deep neural network, which trains on internal layers with fixed structure,changed into internal transfer ability in deep neural network. The changes of learning process are formalized into a mathematical ex-pression. Considering the influence of the data set on the training process, the information geometric analysis method is used to determine the metrics and connections over manifolds of different data sets, which can realize the embed-ding mapping between different data sets. At the same time, the change of parameter space is also put into a manifold space to explore its common influence on learning process. Finally, a mathematical explanation is provided for the in-ternal transfer phenomenon. Meanwhile, after the analysis and experiments, the process of internal transfer is identi-fied as a change which can make the network search for optimal search in a wider space. Therefore, the model can obtain a relative better solution in learning process.

引文

[1] CLERY D,VOSS D. All for one and one for all[J]. Science,2005,308(5723):809—809.
    [2] DENèVE S,MACHENS C K. Efficient codes and balancednetworks[J]. Nature Neuroscience,2016,19(3):375.
    [3] NG P C,HENIKOFF S. SIFT:predicting amino acid changes thataffect protein function[J]. Nucleic Acids Research,2003,31(13):3812—3814.
    [4] DALAL N,TRIGGS B. Histograms of oriented gradients for humandetection[C]//IEEE Conference on Computer Vision and PatternRecognition. IEEE Computer Society,2005:886—893.
    [5] HINTON G E,OSINDERO S,TEH Y W. A fast learning algorithmfor deep belief nets[J]. Neural Computation,2014,18(7):1527—1554.
    [6] LEVINE S,FINN C,DARREL T,et al. End-to-end training ofdeep visuomotor policies[J]. Journal of Machine LearningResearch,2016,17(1):1334—1373.
    [7] LECUN Y,BENGIO Y,HINTON G. Deep learning[J]. Nature,2015,521(7553):436—444.
    [8] OUYANG W,WANG X,ZHANG C,et al. Factors in fine tuningdeep model for object detection with long-tail distribution[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2016:864—873.
    [9] ABRàMOFF M D,LOU Y,ERGINAY A,et al. Improved automateddetection of diabetic retinopathy on a publicly available datasetthrough integration of deep learning[J]. InvestigativeOphthalmology&Visual Science,2016,57(13):5200.
    [10]YOSINSKI J,CLUNE J,BENGIO Y,et al. How transferable arefeatures in deep neural networks?[C]//Advances in NeuralInformation Processing Systems 2014. 2014:3320-3328.
    [11] DAUPHIN Y,PASCANU R,GULCEHRE C,et al. Identifying andattacking the saddle point problem in high-dimensional non-convex optimization[J]. Mathematics,2014,111(61):2475—2485.
    [12] AMARI S I. Information geometry of statistical inference-anoverview[C]//Information Theory Workshop,2002. Proceedings ofthe 2002 IEEE. IEEE,2002:86-89.
    [13] DENG J,DONG W,SOCHER R,et al. ImageNet:A large-scalehierarchical image database[C]//IEEE Conference on ComputerVision and Pattern Recognition. IEEE Computer Society,2009:248—255.
    [14]AMARI S,NAGAOKA H. Methods of information geometry[M].American Mathematical Society,2000:13-206.
    [15] LI F F,FERGUS R,PERONA P. Learning generative visual modelsfrom few training examples:an incremental Bayesian approachtested on 101 object categories[J]. Computer Vision and ImageUnderstanding,2007,106(1):59-70.
    [16] GRIFFIN G,HOLUB A,PERONA P. Caltech-256 object categorydataset[EB/OL]. http://www.vision.caltech.edu/Image_Datasets/Caltech101,April 5,2006.
    [17] SCHMITZER B,SCHNORR C. Globally optimal joint imagesegmentation and shape matching based on Wasserstein modes[J].Journal of Mathematical Imaging&Vision,2015,52(3):436—458.
    [18] KRIZHEVSKY A,SUTSKEVER I,HINTON G E. ImageNetclassification with deep convolutional neural networks[C]//International Conference on Neural Information ProcessingSystems. Curran Associates Inc,2012:1097—1105.
    [19] SZEGEDY C,LIU W,JIA Y,et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and PatternRecognition(CVPR). IEEE,2015:1—9.
    [20] IM D J,TAO M,BRANSON K. An empirical analysis of theoptimization of deep network loss surfaces[J]. ArXiv PreprintArXiv:1612.04010,2016.
    [21]DAUPHIN Y N,PASCANU R,GULCEHRE C,et al. Identifyingand attacking the saddle point problem in high-dimensional non-convex optimization[C]//International Conference on NeuralInformation Processing Systems. MIT Press,2014:2933-2941.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700