用户名: 密码: 验证码:
关于深度学习的综述与讨论
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Overview on deep learning
  • 作者:胡越 ; 罗东阳 ; 花奎 ; 路海明 ; 张学工
  • 英文作者:HU Yue;LUO Dongyang;HUA Kui;LU Haiming;ZHANG Xuegong;Department of Automation, Tsinghua University;Institute of Information Technology, Tsinghua University;School of Life Sciences, Tsinghua University;
  • 关键词:深度学习 ; 机器学习 ; 卷积神经网络 ; 递归神经网络 ; 多层感知器 ; 自编码机 ; 学习算法 ; 机器学习理论
  • 英文关键词:deep learning;;machine learning;;convolutional neural network;;recursive neural network;;multilayer perceptron;;auto-encoder;;learning algorithms;;machine learning theory
  • 中文刊名:ZNXT
  • 英文刊名:CAAI Transactions on Intelligent Systems
  • 机构:清华大学自动化系;清华大学信息技术研究院;清华大学生命学院;
  • 出版日期:2018-10-26 10:59
  • 出版单位:智能系统学报
  • 年:2019
  • 期:v.14;No.75
  • 基金:国家自然科学基金项目(61721003)
  • 语种:中文;
  • 页:ZNXT201901001
  • 页数:19
  • CN:01
  • ISSN:23-1538/TP
  • 分类号:5-23
摘要
机器学习是通过计算模型和算法从数据中学习规律的一门学问,在各种需要从复杂数据中挖掘规律的领域中有很多应用,已成为当今广义的人工智能领域最核心的技术之一。近年来,多种深度神经网络在大量机器学习问题上取得了令人瞩目的成果,形成了机器学习领域最亮眼的一个新分支——深度学习,也掀起了机器学习理论、方法和应用研究的一个新高潮。对深度学习代表性方法的核心原理和典型优化算法进行了综述,回顾与讨论了深度学习与以往机器学习方法之间的联系与区别,并对深度学习中一些需要进一步研究的问题进行了初步讨论。
        Machine learning is a discipline that involves learning rules from data with mathematical models and computer algorithms. It is becoming one of the core technologies in the field of artificial intelligence, and it is useful for many applications that require mining rules from complex data. In recent years, various deep neural network models have achieved remarkable results in many fields, and this has given rise to an interesting new branch of the machine learning: deep learning. Deep learning leads the new wave of studies on theories, methods, and applications of machine learning. This article reviews the relationships and differences between deep learning and previous machine learning methods, summarizes the key principles and typical optimization algorithms of representative deep learning methods,and discusses some remaining problems that need to be further addressed.
引文
[1] MCMAHAN H B,HOLT G,SCULLEY D,et al. Ad click prediction:a view from the trenches[C]//Proceedings of the19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA, 2013:1222-1230.
    [2] GRAEPEL T,CANDELA J Q,BORCHERT T,et al.Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft's bing search engine[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning.Haifa,Israel,2010:13-20.
    [3] HE Xinran,PAN Junfeng,JIN Ou,et al. Practical lessons from predicting clicks on ads at Facebook[C]//Proceedings of the 8th International Workshop on Data Mining for Online Advertising. New York, USA, 2014:1-9.
    [4] CHEN Tianqi, HE Tong. Higgs boson discovery with boosted trees[C]//Proceedings of the 2014 International Conference on High-Energy Physics and Machine Learning. Montreal, Canada, 2014:69-80.
    [5] GOLUB T R, SLONIM D K, TAMAYO P, et al. Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[J]. Science, 1999,286(5439):531-537.
    [6] POON T C W,CHAN A T C,ZEE B,et al. Application of classification tree and neural network algorithms to the identification of serological liver marker profiles for the diagnosis of hepatocellular carcinoma[J]. Oncology, 2001,61(4):275-283.
    [7] AGARWAL D. Computational advertising:the linkedin way[C]//Proceedings of the 22nd ACM International Conference on Information&Knowledge Management. San Francisco, USA,2013:1585-1586.
    [8] MCCULLOCH W S, PITTS W. A logical calculus of the ideas immanent in nervous activity[J]. Bulletin of mathematical biology, 1990, 52(1/2):99-115.
    [9] HEBB D O. The organization of behavior:a neuropsychological theory[M]. New York:John Wiley and Sons, 1949:12-55.
    [10] ROSENBLATT F. The perceptron-a perceiving and recognizing automaton[R]. Ithaca, NY:Cornell Aeronautical Laboratory, 1957.
    [11] MINSKY M L, PAPERT S A. Perceptrons:an introduction to computational geometry[M]. Cambridge:MIT Press, 1969:227-246.
    [12] HAUGELAND J. Artificial intelligence:the very idea[M]. Cambridge:MIT Press, 1989:3-11.
    [13] MCCORDUCK P. Machines who think:a personal inquiry into the history and prospects of artificial intelligence[M]. 2nd ed. Natick:A. K. Peters/CRC Press, 2004:2-12.
    [14] HOPFIELD J J. Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the national academy of sciences of the United States of America, 1982, 79(8):2554-2558.
    [15] LE CUN Y. Learning process in an asymmetric threshold network[M]//BIENENSTOCK E, SOULIE F, WEISBUCH G. Disordered Systems and Biological Organiza-tion. Berlin, Heidelberg:Springer, 1986.
    [16] RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986, 323(6088):533-536.
    [17] PARKER D B. Learning-logic[R]. Technical Report TR-47. Cambridge, MA:Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology, 1985.
    [18] RUMELHART D E, MCCLELLAND J L. Readings in congnitive science[M]. San Francisco:Margan Kaufmann,1988:399-421.
    [19] KOHONEN T. Self-organization and associative memory[M]. 3rd ed. Berlin Heidelberg:Springer-Verlag, 1989:119-155.
    [20] SMOLENSKY P. Information processing in dynamical systems:foundations of harmony theory[M]//RUMELHART D E, MCCLELLAND J L. Parallel Distributed Processing, Vol. 1. Cambridge:MIT Press, 1986:194-281.
    [21] CORTES C, VAPNIK V. Support-vector networks[J].Machine learning, 1995, 20(3):273-297.
    [22] BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C]//Proceedings of the 5th Annual Workshop on Computational Learning Theory. Pittsburgh, Pennsylvania, USA, 1992:144-152.
    [23]张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26(1):32-42.ZHANG Xuegong. Introduction to statistical learning theory and support vector machines[J]. Acta automatica sinica, 2000, 26(1):32-42.
    [24] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science,2006,313(5786):504-507.
    [25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM,2017, 60(6):84-90.
    [26] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489.
    [27] TANG Yichuan. Deep learning using linear support vector machines[J]. arXiv:1306.0239, 2015.
    [28] BOURLARD H, KAMP Y. Auto-association by multilayer perceptrons and singular value decomposition[J]. Biological cybernetics, 1988, 59(4/5):291-294.
    [29] HINTON G E, ZEMEL R S. Autoencoders, minimum description length and Helmholtz free energy[C]//Proceed-ings of the 6th International Conference on Neural Information Processing Systems. Denver, Colorado, USA,1993:3-10.
    [30] SCHWENK H, MILGRAM M. Transformation invariant autoassociation with application to handwritten character recognition[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Denver, Colorado, USA, 1994:991-998.
    [31] HINTON G E, MCCLELLAND J L. Learning representations by recirculation[C]//Proceedings of 1987 International Conference on Neural Information Processing Systems. Denver, USA, 1987:3
    [32] SCHOLKOPF B, PLATT J, HOFMANN T. Efficient learning of sparse representations with an energy-based model[C]//Proceedings of 2006 Conference Advances in Neural Information Processing Systems. Vancouver,Canada, 2007:1137-1144.
    [33] RANZATO M, HUANG Fujie,BOUREAU Y L,et al.Unsupervised learning of invariant feature hierarchies with applications to object recognition[C]//Proceedings of2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA,2007:1-8.
    [34] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland, 2008:1096-1103.
    [35] RIFAI S,VINCENT P,MULLER X,et al. Contractive auto-encoders:explicit invariance during feature extraction[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington, USA,2011:833-840.
    [36] BENGIO Y,LAMBLIN P, POPOVICI D,et al. Greedy layer-wise training of deep networks[C]//Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver, Canada, 2006:153-160.
    [37] SALAKHUTDINOV R, MNIH A, HINTON G. Restricted Boltzmann machines for collaborative filtering[C]//Proceedings of the 24th International Conference on Machine Learning. Corvalis, Oregon, USA, 2007:791-798.
    [38] HINTON G E. A practical guide to training restricted Boltzmann machines[M]//MONTAVON G, ORR G B,MULLER K R. Neural Networks:Tricks of the Trade.2nd ed. Berlin, Heidelberg:Springer, 2012:599-619.
    [39] LECUN Y,CHOPRA S,HADSELL R,et al. A tutorial on energy-based learning[M]//BAKIR G, HOFMANN T,SCHOLKOPF B, et al. Predicting Structured Data. Cam-bridge:MIT Press, 2006:45-49.
    [40] LEE H, EKANADHAM C, NG A Y. Sparse deep belief net model for visual area V2[C]//Proceedings of the 20th International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada,2007:873-880.
    [41] HINTON G E. Deep belief networks[J]. Scholarpedia,2009, 4(5):5947.
    [42] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural computation,2006, 18(7):1527-1554.
    [43] LE CUN Y,BOSER B,DENKER J S, et al. Handwritten digit recognition with a back-propagation network[C]//Proceedings of the 2nd International Conference on Neural Information Processing Systems. Denver, USA, 1989:396-404.
    [44] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradientbased learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
    [45] HUBEL D H, WIESEL T N. Receptive fields and functional architecture of monkey striate cortex[J]. The journal of physiology, 1968, 195(1):215-243.
    [46] SPRINGENBERG J T, DOSOVITSKIY A,BROX T,et al. Striving for simplicity:the all convolutional net[J].arXiv:1412.6806, 2014.
    [47] SIMONY AN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
    [48] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015:1-9.
    [49] LIN Min, CHEN Qiang, YAN Shuicheng. Network in network[J]. arXiv:1312,4400, 2013.
    [50] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:770-778.
    [51] ZHANG Xiang, ZHAO Junbo, LECUN Y. Characterlevel convolutional networks for text classification[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal,Canada, 2015:649-657.
    [52] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[J]. arXiv:1705.03122, 2017.
    [53] PHAM N Q, KRUSZEWSKI G, BOLEDA G. Convolutional neural network language models[C]//Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, USA, 2016:1153-1162.
    [54] HARRIS D M, HARRIS S J. Digital design and computer architecture[M]. 2nd ed. San Francisco:Morgan Kaufmann Publishers Inc., 2013:123-125.
    [55] MIKOLOV T, CHEN Kai,CORRADO G,et al. Efficient estimation of word representations in vector space[J].arXiv:1301,3781,2013.
    [56] GOLDBERG Y, LEVY O. word2vec Explained:deriving Mikolov et al.'s negative-sampling word-embedding method[J]. Arxiv:1402.3722, 2014.
    [57] TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 2015:1422-1432.
    [58] SUTSKEVER I, MARTENS J,HINTON G. Generating text with recurrent neural networks[C]//Proceedings of the 28th International Conference on Machine Learning.Bellevue,Washington,USA, 2011:1017-1024.
    [59] GRAVES A. Generating sequences with recurrent neural networks[J]. arXiv:1308.0850, 2013.
    [60] WERBOS P J. Generalization of backpropagation with application to a recurrent gas market model[J]. Neural networks,1988, 1(4):339-356.
    [61] PASCANU R, MIKOLOV T, BENGIO Y. On the difficulty of training recurrent neural networks[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA, 2013:1310-1318.
    [62] BENGIO Y, SIMARD P, FRASCONI P. Learning longterm dependencies with gradient descent is difficult[J].IEEE transactions on neural networks, 1994, 5(2):157-166.
    [63] HOCHREITER S, BENGIO Y, FRASCONI P. Gradient flow in recurrent nets:the difficulty of learning long-term dependencies[M]//KOLEN J F, KREMER S C. A Field Guide to Dynamical Recurrent Networks. New York:Wiley-IEEE Press, 2001:6-8.
    [64] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780.
    [65] GERS F A,SCHMIDHUBER J,CUMMINS F. Learning to forget:continual prediction with LSTM[J]. Neural computation, 2000, 12(10):2451-2471.
    [66] GERS F A,SCHRAUDOLPH N N,SCHMIDHUBER J.Learning precise timing with LSTM recurrent networks[J]. The journal of machine learning research, 2003, 3:115-143.
    [67] GERS F A,SCHMIDHUBER J. Recurrent nets that time and count[C]//Proceedings of 2000 IEEE-INNS-ENNS International Joint Conference on Neural Networks.Como, Italy,2000:3189.
    [68] GREFF K,SRIVASTAVA R K,KOUTNIK J,et al.LSTM:a search space odyssey[J]. IEEE transactions on neural networks and learning systems,2017, 28(10):2222-2232.
    [69] CHO K,VAN MERRIENBOER B,GULCEHRE C,ET AL. Learning phrase representations using RNN encoderdecoder for statistical machine translation[J]. arXiv:1406.1078, 2014.
    [70] JOZEFOWICZ R,ZAREMBA W, SUTSKEVER I. An empirical exploration of recurrent network architectures[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille,France, 2015:2342-2350.
    [71] LE Q V, JAITLY N, HINTON G E. A simple way to initialize recurrent networks of rectified linear units[J]. arXiv:1504.00941,2015.
    [72] WU Yonghui,SCHUSTER M,CHEN Zhifeng,et al.Google's neural machine translation system:bridging the gap between human and machine translation[J]. arXiv:1609.08144, 2016.
    [73] YIN Jun, JIANG Xin, LU Zhengdong, et al. Neural generative question answering[J]. arXiv:1512.01337, 2016.
    [74] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:3104-3112.
    [75] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:2204-2212.
    [76] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2016.
    [77] ANDRYCHOWICZ M, KURACH K. Learning efficient algorithms with hierarchical attentive memory[J]. arXiv:1602.03218, 2016.
    [78] SCHUSTER M,PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE transactions on signal processing, 1997, 45(11):2673-2681.
    [79] PASCANU R, GULCEHRE C, CHO K, et al. How to construct deep recurrent neural networks[J]. arXiv:1312.6026, 2014.
    [80] HERMANS M, SCHRAUWEN B. Training and analyzing deep recurrent neural networks[C]//Proceedings of the26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:190-198.
    [81] LE Q V,NGIAM J,COATES A,et al. On optimization methods for deep learning[C]//Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, Washington, USA, 2011:265-272.
    [82] RUDER S. An overview of gradient descent optimization algorithms[J]. arXiv:1609.04747, 2016.
    [83] YOUSOFF S N M, BAHARIN A, ABDULLAH A. A review on optimization algorithm for deep learning method in bioinformatics field[C]//Proceedings of 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences. Kuala Lumpur, Malaysia, 2016:707-711.
    [84] QIAN Ning. On the momentum term in gradient descent learning algorithms[J]. Neural networks, 1999,12(1):145-151.
    [85] SUTSKEVER I,MARTENS J,DAHL G,et al. On the importance of initialization and momentum in deep learning[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, USA, 2013:1139-1147.
    [86] DUCHI J, HAZAN E, Singer A Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. The journal of machine learning research, 2011, 12:2121-2159.
    [87] TIELEMAN T, HINTON G E. Lecture 6.5-rmsprop:Divide the gradient by a running average of its recent magnitude[C]//COURSERA:Neural Networks for Machine Learning. 2012.
    [88] ZEILER M D. ADADELTA:an adaptive learning rate method[J]. arXiv:1212.5701,2012.
    [89] KINGMA D P, BA J. Adam:a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
    [90] FLETCHER R. Practical methods of optimization[M].New York:John Wiley and Sons, 2013:110-133.
    [91] NOCEDAL J. Updating quasi-Newton matrices with limited storage[J]. Mathematics of computation, 1980,35(151):773-782.
    [92] MARTENS J. Deep learning via Hessian-free optimization[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning.Haifa, Israel,2010:735-742.
    [93] KIROS R. Training neural networks with stochastic hessi-an-free optimization[J]. arXiv:1301.3641, 2013.
    [94] ERHAN D,BENGIO Y,COURVILLE A, et al. Why does unsupervised pre-training help deep learning?[J].The journal of machine learning research, 2010, 11:625-660.
    [95] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy, 2010, 9:249-256.
    [96] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al.Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of2015 IEEE International Conference on Computer Vision.Santiago, Chile, 2015:1026-1034.
    [97] XU Bing,WANG Naiyan, CHEN Tianqi, et al. Empirical evaluation of rectified activations in convolutional network[J]. arXiv:1505.00853, 2015
    [98] GULCEHRE C, MOCZULSKI M, DENIL M, et al.Noisy activation functions[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA, 2016:3059-3068.
    [99] LECUN Y, BOTTOU L, ORR G B, et al. Efficient BackProp[M]//ORR G B, MULLER K R. Neural Networks:Tricks of the Trade. Berlin, Heidelberg:Springer, 1998:9-50.
    [100] AMARI S I. Natural gradient works efficiently in learning[J]. Neural computation, 1998, 10(2):251-276.
    [101] LECUN Y,BOSER B, DENKER J S,et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4):541-551.
    [102] NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]//Proceedings of the27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010:807-814.
    [103] CLEVERT D A, UNTERTHINER T, HOCHREITER S.Fast and accurate deep network learning by exponential linear units(ELUs)[J]. arXiv:1511.07289, 2016.
    [104] LI Yang, FAN Chunxiao, LI Yong, et al. Improving deep neural network with multiple parametric exponential linear units[J]. Neurocomputing, 2018, 301:11-24.
    [105] GOODFELLOW I J,WARDE-FARLEY D,MIRZA M,et al. Maxout networks[C]//Proceedings of the 30thInternational Conference on Machine Learning. Atlanta,USA, 2013:1319-1327.
    [106] HINTON G E,SRIVASTAVA N,KRIZHEVSKY A, et al. Improving neural networks by preventing co-adapta-tion of feature detectors[J]. arXiv:1207.0580, 2012.
    [107] BOUTHILLIER X, KONDA K, VINCENT P, et al.Dropout as data augmentation[J]. arXiv:1506.08700,2016.
    [108] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al.Dropout:a simple way to prevent neural networks from overfitting[J]. The journal of machine learning research,2014, 15(1):1929-1958.
    [109] WAN Li, ZEILER M, ZHANG Sixin, et al. Regularization of neural networks using DropConnect[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA, 2013:1058-1066.
    [110] BA L J, FREY B. Adaptive dropout for training deep neural networks[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:3084-3092.
    [111] IOFFE S, SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift[J]. arXiv:1502.03167, 2015.
    [112] DANIELY A, LINIAL N, SHALEV-SHWARTZ S.From average case complexity to improper learning complexity[C]//Proceedings of the 46th Annual ACM Symposium on Theory of Computing. New York, USA,2014:441-448.
    [113] DANIELY A, SHALEV-SHWARTZ S. Complexity theoretic limitations on learning DNF's[J]//JMLR:Workshop and Conference Proceedings. 2016:1-16.
    [114] DANIELY A. Complexity theoretic limitations on learning halfspaces[C]//Proceedings of the 48th Annual ACM Symposium on Theory of Computing. Cambridge, USA,2016:105-117.
    [115] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[J]. arXiv:1412.6572, 2015.
    [116] ANTHONY M, BARTLETT P L. Neural network learning:theoretical foundations[M]. New York:Cambridge University Press, 2009:286-295.
    [117] BARTLETT P L. The sample complexity of pattern classification with neural networks:the size of the weights is more important than the size of the network[J]. IEEE transactions on information theory, 1998, 44(2):525-536.
    [118] BAUM E B, HAUSSLER D. What size net gives valid generalization?[J]. Neural computation, 1989, 1(1):151-160.
    [119] HARDT M,RECHT B, SINGER Y. Train faster, gener-alize better:Stability of stochastic gradient descent[J].arXiv:1509.01240, 2015.
    [120] NEYSHABUR B, TOMIOKA R, SREBRO N. Normbased capacity control in neural networks[C]//Proceedings of the 28th Conference on Learning Theory. Paris,France. 2015, 40:1-26.
    [121] PRATT L Y. Discriminability-based transfer between neural networks[C]//Proceedings of the 5th International Conference on Neural Information Processing Systems.Denver, USA, 1992:204-211.
    [122] HORNIK K, STINCHCOMBE M, WHITE H. Multilayer feedforward networks are universal approximators[J].Neural networks, 1989, 2(5):359-366.
    [123] BARRON A R. Universal approximation bounds for superpositions of a sigmoidal function[J]. IEEE transactions on information theory, 1993, 39(3):930-945.
    [124] DELALLEAU O, BENGIO Y. Shallow vs. deep sumproduct networks[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada,Spain, 2011:666-674.
    [125] BIANCHINI M, SCARSELLI F. On the complexity of neural network classifiers:a comparison between shallow and deep architectures[J]. IEEE transactions on neural networks and learning systems, 2014, 25(8):1553-1565.
    [126] ELDAN R, SHAMIR O. The power of depth for feedforward neural networks[C]//JMLR:Workshop and Conference Proceedings. 2016:1-34.
    [127] ANDONI A, PANIGRAHY R,VALIANT G, et al.Learning polynomials with neural networks[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014:1908-1916.
    [128] ARORA S,BHASKARA A, GE Rong,et al. Provable Bounds for Learning Some Deep Representations[C]//Proceedings of the 31 st International Conference on Machine Learning. Beijing, China, 2014:584-592.
    [129] BRUNA J, MALLAT S. Invariant scattering convolution networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8):1872-886.
    [130] CHOROMANSKA A,HENAFF M,MATHIEU M,et al. The loss surfaces of multilayer networks[C]//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA,2015, 38:192-204.
    [131] GIRYES R, SAPIRO G, BRONSTEIN A M. Deep neural networks with random Gaussian weights:a universalclassification strategy?[J]. IEEE transactions on signal processing, 2016, 64(13):3444-3457.
    [132] LIVNI R, SHALEV-SHWARTZ S, SHAMIR O. On the computational efficiency of training neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal,Canada, 2014:855-863.
    [133] NEYSHABUR B, SALAKHUTDINOV R, SREBRO N.Path-SGD:path-normalized optimization in deep neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.Montreal, Canada, 2015:2422-2430.
    [134] SAFRAN I, SHAMIR O. On the quality of the initial basin in overspecified neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, USA, 2016:774-782.
    [135] SEDGHI H, ANANDKUMAR A. Provable methods for training neural networks with sparse connectivity[J].arXiv:1412.2693,2015.
    [136] DANIELY A, FROSTIG R, SINGER Y. Toward deeper understanding of neural networks:the power of initialization and a dual view on expressivity[C]//Proceedings ofthe 30th Conference on Neural Information Processing Systems 29. Barcelona, Spain, 2016:2253-2261.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700