模型选择的曲率方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

模型选择的曲率方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Curvature Method of Model Selection
作者：吕子昂
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：机器学习 ; 神经计算 ; 模型选择 ; 微分几何 ; 知觉学习 ; 曲率 ; 泛化能力
英文关键词：Machine Learning ; Neural Computing ; Model Selection ; Differential
英文关键词：Geometry ; Perceptual Learning ; Curvature ; Generalization Ability
学位年度：2012
导师：罗四维
学科代码：081203
学位授予单位：北京交通大学
论文提交日期：2011-03-01

摘要

机器学习是人工智能的重要研究领域,模型选择是机器学习的重要研究内容。机器学习的许多实际问题中,需要从给定的有限观测数据推测产生这些数据的真实模型,而可能的模型往往有多个,从众多可能的模型中选择与未知的真实模型最匹配的模型就是模型选择。机器学习中的两大核心问题泛化和表示都与模型选择密切相关,因此模型选择是解决机器学习核心问题的关键。
     各种学习理论从不同角度研究模型选择问题,如经典的统计理论通过偏差-方差折中实现模型选择；解决逆问题的正则化方法使用正则化项惩罚复杂的模型；而统计机器学习理论通过对给定数据逼近的精度和逼近模型的复杂性之间进行折中来选择泛化性能较好的模型；随着微分几何在统计机器学习领域的发展应用,研究者使用几何方法研究统计模型的机器学习问题。这一理论把统计模型看成嵌入在所有可能分布构成的空间中的子流形,通过研究此子流形在空间的位置和形状信息对模型的拟合度和复杂度作出评估。这些理论和方法的共同之处是都体现了奥卡姆剃刀原则——“如无必要,勿增实体”,即泛化能力和模型在训练集的拟合度以及模型的复杂度相关。不同理论和方法的区别主要在于对复杂度的量化方法不同。
     本文通过曲率方法研究统计模型的几何性质,对模型选择中的几个关键问题包括泛化能力和参数不变性的复杂度度量、统计模型的整体性质及模型选择的统一框架进行研究。首先分析论证在模型流形空间中模型局部曲率的参数不变性和几何直观性,提出基于Gauss-Kronecker曲率的衡量模型局部性质的模型选择准则GKCIC;然后利用曲率的进一步运算得到模型流形的拓扑信息和整体几何信息,提出用Euler-Poincare拓扑信息衡量统计模型的全局性质的方法EPTIC；在此基础上提出基于曲率方法的模型选择统一框架,并结合知觉学习的特点,构建更加系统有效的层次化“记忆-预测”知觉学习模型。本文以微分几何为数学基础,用曲率的方法对模型选择及其在知觉学习中的应用进行了深入的研究,取得了一定的研究成果,并经过实验验证,为进一步的研究和应用奠定了基础。本文创造性的研究成果主要有：
     1.提出一种基于曲率的模型选择准则GKCIC(Gauss-Kronecker Curvature Information Criterion)。分析模型的泛化能力和其固有复杂度及其在训练集上的拟合度之间的关系；提出度量学习机器复杂度的Gauss-Kroneker内蕴曲率方法；给出基于参数估计量邻域附近的解轨迹方法的曲率计算方法,并分析正则化条件；证明用于衡量模型泛化能力的未来残差可以用曲率来表示；给出基于曲率的模型选择准则。该方法具有坚实的理论基础、内蕴几何性质和参数表示不变性,揭示了模型选择的内在本质,能直观清晰地理解模型选择的几何意义；实验表明,其效果明显优于参数相关的方法。
     2.提出一种基于拓扑信息衡量模型流形整体性质的方法EPTIC (Euler-Poincare Topology Information Criterion)。根据曲率和度量的互生关系,以曲率作为局部信息的基本几何构成元；通过对曲率的积分,得到统计流形的拓扑不变量Euler-Poincare示性数,作为统计流形的整体拓扑不变量；通过Gauss-Bonnet定理和Minkowski积分公式,得到体积等反映流形整体性质的拓扑和几何量；给出使用流形整体性质的模型选择方法,使模型具有全局的泛化能力；分析模型流形的拓扑性质和全局几何性质的重要意义,给出从局部性质得到整体性质的计算方法,实验表明其性能优于同类算法。
     3.提出基于几何曲率方法的模型选择统一框架。综合考虑统计模型的局部性质和整体性质,在前两章工作的基础上提出基于几何曲率方法的模型选择统一框架；讨论基于曲率的方法与统计学习理论之间的关系；在此统一框架下,结合知觉学习和认知心理学的研究成果,构建一种基于曲率和拓扑信息的层次化知觉学习计算模型。该模型通过自底向上的过程对局部信息进行抽象,得到全局拓扑和几何性质,作为整体先验知识；通过自顶向下的过程对输入信息进行预测、分析、比较,修正先验知识,指导下一次的预测；结合自底向上和自顶向下过程,使模型具有局部的特定性和全局的泛化能力。该框架综合考虑模型的局部和全局的泛化能力实现层次化抽象预测机制,体现知觉学习特定性和整体性的特点。
Machine learning is an important research field of artificial intelligence, and model selection is an important research field of machine learning. In many real world machine learning application, the problem is to infer the true model from the given data. However, there may be many possible models, and model selection studies how to choose the model which most matches the real world application. Two core problems of machine learning, generalization and representation, both are related to model selection. To sum up, model selection is the key of solving machine learning problems.
     Learning theories investigate the problem of model selection from various views. Classic statistic learning theories choose the model through the balance of bias and variance. Regularization methods which slove inverse problems penalize complex models through regularization. Statistic machine learning methods choose the model which can generalize well on unseen data through a compromise between the accuracy on training data and the complexity of the model. With the development of differential geometry in mathematical, researchers began to use geometric method to solve the problems in machine learning of statistical models. This theory views a learning machine as a sub-manifold embedded in the space which comprises all possible distribution, and evaluate accuracy and complexity of the models through the information of its position and shape in the space. All these learning theories reflect the Occam's Razor-'Entities should not be multiplied unnecessarily', that the generalization ability of a model is related to its goodness of fit to the training data, as also related to its complexity. Different learning theories have different measure of complexity.
     This dissertation using the curvature method studies the geometric properties of statistical models to address several key issues in model selection, like generalization ability and reparameterization invariance complexity of model, global properties of statistical model and a unified framework of model selection. First, we analyse and prove the local curvature of model is a reparameterization invariant in the manifold space of models and has a geometric intuition, present a new model selection criterion GKCIC to measure local property of a statistical model. Then curvature can be used to calculate topology information and global geometric information of the model manifold further, we present a new method EPTIC to measure global property of statistical models. Finally we propose a unified framework for model selection and build a hierarchal "memory-prediction" perceptual learning model incorporated the characteristic of perceptual learning. Based on differential geometry, the problem of model selection and its application in perceptual learning are in-depth studied. The methods and technologies proposed in this dissertation are verified through experiments. So it paves the way for further studies. The main contributions are summarized as the following:
     1. A model selection algorithm GKCIC (Gauss-Kronecker Curvature Information Criterion) based on curvature is proposed. The relationship of generalization ability of a model with its complexity and its accuracy is analyzed. A Gauss-Kroneker curvature method to measure the complexity of learning machine is proposed. A method to calculate the curvature of model through the solution locus in the neighborhood of the expectation of parameters is given, and the normalization criterion is given. Prove that the future residual that is qualified to measure the generalization ability can be expressed by using the intrinsic curvature of model. The proposed algorithm has solid theoretical foundation, intrinsic geometric properties and reparameterization invariant, it can illustrate the geometrical nature of model selection methods intuitively. Experimental results show that the performance of the proposed algorithm is better than the parameter related methods.
     2. A method EPTIC (Euler-Poincare Topology Information Criterion) using topology information to measure the global property of the model manifold is proposed. Based on the interdependence relationship of curvature and metric, curvature is used as geometrical element of local information. Euler-Poincare characteristic which is a topology invariant of statistical manifold is gotten through the integration of curvature, and it is viewed as a global topology invariant. Some geometrical measure such as the volume of manifold is gotten through the Gauss-Bonnet theorem and Minkowski integration formulation. A model selection method which uses global properties of model manifold is proposed to select the model with global generalization ability. The significance of topology property and global geometry property to the model manifold are analyzed, the computational method to get the topology property from the local property is given. Experimental results show that it can perform better than some other geometry based methods.
     3. An unified framework of curvature based geometry method of model selection is proposed. Based on the work of previous two chapters, an unified framework of curvature based model selection is proposed which considering both the local and global property of statistical model. The comparision between our curvature based method and the statistical learning theory is given.Under this unified framework, a hierarchical computation model of perceptual learning is proposed which incorporate the research achievement of perceptual learning and cognitive psychology. The global topology and geometry property, which is gotten through the bottom-up abstraction of local information, is used as priori knowledge. The model predicts the input and contract the prediction with the real input through the top-down process. If the prediction is wrong, then the priori knowledge is revised to guide the next prediction. The proposed computation model is local specifically and global generalized by combing the bottom-up abstraction process and the top-down guidance process. The unified framework considers the local and global generalization ability synthetically, it realizes hierarchal "abstract-prediction" mechanism and reflects the specificity and integrity of perceptual learning.

引文

[1]Turing A. M. Computing Machinery and Intelligence. Mind,1950,LIX (2236) 433-460.
    [2]Mitchell T. M.Machine Learning. Mcgraw-Hill,1997
    [3]Vapnik V. N.The Nature of Statistical Learning Theory.2nd edition. New York:Springer-Verlag, 1999
    [4]Mcculloch W. S., Pitts W. H. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics,1943,5115-133.
    [5]Wiener N.Cybernetics: Or Control and Communication in the Animal and the Machine. New York: Wiley,1948
    [6]Hebb D. O.The Organization of Behavior:A Neuropsychological Theory. New York:Wiley,1949
    [7]Rosenblatt F, The Perceptron:A Probabilistic Model for Ingormation Storage and Organization in the Brain. Psychological Review,1958,65 386-408.
    [8]Rosenblatt F.Principles of Neurodynamics:Perceptron and Theory of Brain Mechanisms. Washington D.C.:Spartan Books,1962
    [9]Novikoff A. B. J.On Convergence Proofs for Perceptrons. In:Proceedings of the The Symposium on the Mathematical Theory of Automata Brooklyn, NY,1962Polytechnic Institute of Brooklyn, 615-622.
    [10]Widrow B.Generalization and Information Storage in Networks of Adaline 'Neurons'. In: M.C.Yovitz,G.T.Jacobi,G.D.Goldstein eds. Self-Organizing Systems. Washington, D.C:Spartan Books,1962,435-461.
    [11]Schapire R. E. The Strength of Weak Learnability. Machine Learning,1990,5 (2) 197-227.
    [12]Minsky M. L., S.A.Papert.Perceptrons. Cambridge, MA:MIT Press,1969
    [13]Grossberg S. How Does a Brain Build a Cognitive Code? Psychological Review,1980,8 71-51.
    [14]Kohonen T. Self-Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics,1982,43 59-69.
    [15]Willshaw D.J., Von Der Malsburg C. How Patterned Neural Connections Can Be Setup by Self-Organization. Proceedings of the Royal Society of London Series B,1976,194 431-445.
    [16]Boden M. A.The Philosophy of Artificial Intelligence. Oxford University Press,1990
    [17]Samuel L. Some Studies in Machine Learning Using the Game of Checkers. Ibm Journal of Research and Development,1959,3 211-229.
    [18]Samuel L. Some Studies in Machine Learning Using the Game of Checkers, Part Ii. Ibm Journal of Research and Development,1967,11 (4)601-618.
    [19]Hunt E. B., Marin J., Stone P. J.Experiments in Induction. New York: Academic Press,1966
    [20]Newell A., Simon H. A.Human Problem Solving. Englewood Cliffs, NJ: Prentice-Hall,1972
    [21]Friedman J. A Recusive Partitioning Decision Rule for Non-Parametric Classification. IEEE Transactions on Computers,1977,404-408.
    [22]Breiman L., Friedman J., Olshen R. A., Stone P. J.Calssification and Regression Trees. Belmnot, CA:Wadsworth International Group,1984
    [23]Michalski R. S., Chilausky R. L. Learning by Being Told and Learing from Examples:An Experimental Comparison of Two Methods of Knowledge Acquistion in Context of Developing on Expert System for Soybean Disease Diagnosis. Policy Analysis and Information Systems,1980,4 125-150.
    [24]Quinlan J. Induction of Decision Trees. Machine Learning,1986,181-106.
    [25]Rissanen J. Modeling by Shortest Data Description. Automatica,1978,14 465-471.
    [26]Duda R. O., Hart P. E., Stork D. G.Pattern Classification.2nd edition. John Wiley & Sons,2001
    [27]Vapnik V. N., Chervonenkis A. J.Theory of Pattern Recognition(in Russian). Nauka, Moscow,1974
    [28]Tikhonov A. N. On Solving Ill-Posed Problem and Method of Regularization. Doklady Akademii Nawk Ussr,1963,153 501-504.
    [29]Tikhonov A. N.Solution of Ill-Posed Problems. Washington DC:W.H. Winston,1977
    [30]Parzen E. On Estimation of Probability Function and Mode. Annals of Mathematical Statistics, 1962,33(3).
    [31]Kolmogorov A. N. Three Approaches to the Quantitative Definitions of Information. Problem of Inform. Transmission,1965,1 (1)1-7.
    [32]Carbonell J. G., Michalski R. S., Mitchell T. M.An Overview of Machine Learning. In:Michalski, R. S.,Carbonell J. G.,Mitchell T. M. eds. Machine Learning: An Artificial Intelligence Approach. Berlin:Springer-Verlag,1984,3-23.
    [33]Haykin S.神经网络原理.北京：机械工业出版社,2004.207-249
    [34]Hopfield J. J. Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proceedings of the National Academy of Sciences, USA,1982,79 2554-2558.
    [35]Ackley D. H., Hinton G. E., Sejnowski T., J,. A Learning Algorithm for Boltzmann Machines. Cognitive Science,1985,9 147-169.
    [36]Hinton G. E., Sejnowski T., J,.Learning and Relearning in Boltzmann Machines. In:Rumelhart, D. E.,MccIelland J.1. eds. Parallel Distributed Processing:Explorations in Microstructure of Cognition. Cambridge, MA:MIT Press,1986,282-317.
    [37]Rumelhart D. E., Mcclelland J. L.Parallel Distributed Processing:Explorations in Microstructure of Cognition. Cambridge, MA:MIT Press,1986
    [38]Rumelhart D. E., Hinton G. E., Williams R. J.Learning Internal Representations by Error Propagation. In:Rumelhart, D. E.,Mcclelland J. L. eds. Parallel Distributed Processing: Explorations in Microstructure of Cognition. Cambridge, MA:MIT Press,1986,318-362.
    [39]Parker D. B.:Learning-Logic:Casting the Cortex of the Human Brain in Silicon TR-47,Center for Computational Research in Economics and Management Science (1985).
    [40]Lecun Y. Une Procedure D'apprentissage Pour Reseau a Seuil Assymetrique. Cognitiva,1985,85 599-604.
    [41]Werbos P. J. Beyond Regression:New Tools for Prediction and Analysis in the Behavioral Sciences. Harvard University, Cambridge, MA,1974
    [42]Broomhead D. S., Lowe D. Multivariable Functional Interpolation and Adaptive Networks. Complex Systems,1988,2 321-355.
    [43]Poggio T., Girosi F. Networks for Approximation and Learning. Proceedings of the IEEE,1990,78 1481-1497.
    [44]Valiant L. G. A Theory of Learnability. Communications of the ACM,1984,27(11)1134-1142.
    [45]Freund Y., Schapire R. E.Experiments with a New Boosting Alorithm. In:Proceedings of the The Thirteenth International Conference on Machine Learning Bari, Italy,1996148-156.
    [46]Freund Y., Schapire R. E.Game Theory, on-Line Prediction and Boosting. In:Proceedings of the The Ninth Annual Conference on Computational Learning Theory Desenzano Del Garda, Italy, 1996325-332.
    [47]Jacobs R. A., Jordan M. I., Nowlan S. J., Hinton G. E. Adaptive Mixtures of Local Experts. Neural Computation,1991,379-87.
    [48]Jordan M. I., Jacobs R. A. Hierachical Mixture of Experts and the Em Algorichm. Neural Computation,1994,6 181-214.
    [49]Vapnik V. N.Estimation of Dependeces Based on Empirical Data. New York:Springer-Verlag,1982
    [50]Amari S.Differential-Geometrical Methods in Statistics. New York: Springer-Verlag,1985
    [51]Linsker R. Self-Organization in a Perceptual Network. Computer,1988,21 105-117.
    [52]Breiman L. Statistical Modeling:The Two Cultures. Statistical Science,2001,16 (3) 199-231.
    [53]Haykin S.Neural Networks:A Comprehensive Foundation.2nd edition. Upper Saddle River, N.J.: Prentice Hall,1999.207-249
    [54]Rissanen J. Stochastic Complexity and Modeling. Annals of Statistics,1986,14 (3) 1080-1100.
    [55]Vapnik V. N.The Nature of Statistical Learning Theory. New York:Springer-Verlag,1995
    [56]Anthony M., Biggs N.Computational Learning Theory. Combridge University Press,1992
    [57]Vapnik V. N.Transductive Inference and Semi-Supervised Learning. In:Chapelle, O.,Scholkopf B.,Zien A. eds. Semi-Supervised Learning. London:MIT Press,2006,453-472.
    [58]Carbonell J. G. Introduction:Paradigms for Machine Learning. AI Magazine,1989,40 (1) 1-5.
    [59]Dietterich T. Machine Learning Research:Four Current Directions. AI Magazine,1997,18 (4) 97-136.
    [60]Pawlak Z.Rough Set-Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, 1991
    [61]Pawlak Z. Rough Set. International Journal of Computer Information Science,1982,11 (5) 341-356.
    [62]Han J., Kamber M.Data Mining: Concepts and Techniques. New York:Academic Press,2000
    [63]洪家荣.归纳学习一一算法、理论、应用.北京：科学出版社,1997
    [64]Muggleton S. Inductive Logic Programming. New Generation Computing,1991,8 (4) 295-318.
    [65]韩素青,韩彦军.符号机器学习研究//王珏,周志华,周傲英.机器学习及其应用.北京：清华大学出版社,2006,88-115
    [66]Shawe-Taylor J., Bartlett P., Williams R. J., Anthony M. Structural Risk Minimization over Data-Dependent Hierarchies. IEEE Transaction on Information Theory,1998,44 (5) 1926-1940.
    [67]Amari S., Wu S. Improving Support Vector Machine Classifiers by Modifying Kernel Functions. Neural Networks,1999,12783-789.
    [68]Scholkopf B., Smola A. J.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA, USA:MIT Press,2002
    [69]Cristianini N., Shawe-Taylor J.An Introduction to Support Vector Machines. Cambridge University Press,2000
    [70]Shawe-Taylor J., Cristianini N.Kernel Methods for Pattern Analysis. Cambridge, England: Cambridge University Press 2004
    [71]Lafferty J., Labanon G. Diffusion Kernels on Statistical Manifolds. Journal of Machine Learning Research,2005,6 129-163.
    [72]Hansen L. K., Salamon P. Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence,1990,12 (10) 993-1001.
    [73]Kearns M., Valiant L. G:Learning Boolean Formulae or Factoring (1988).
    [74]Freund Y., Schapire R. E.A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting. In:Proceedings of the 2nd European Conference on Computational Learning Theory Barcelona, Spain,199523-37.
    [75]Breiman L. Bagging Predictors. Machine Learning,1996,24 (2) 123-140.
    [76]Breiman L. Random Forest. Machine Learning,2001,45 (1)5-32.
    [77]Zhou Z.-H., Wu J., Tang W. Ensembling Neural Networks:Many Could Be Better Than All. Artificial Intelligence,2002,137 239-263.
    [78]周志华.选择性集成//王珏,周志华,周傲英.机器学习及其应用.北京：清华大学出版社,2006,170-188
    [79]Donoho D. L. Compressed Sensing. IEEE Transactions on Information Theory,2006,52 (6) 1289-1306.
    [80]Candes E. J.Compressive Sampling. In:Proceedings of the the International Congress of Mathematicians Madrid, Spain,2006
    [81]Candes E., Tao T. C.-S. Reflections on Compressed Sensing. IEEE Information Theory Society Newsletter,2008,58(4)20-23.
    [82]Jolliffe I. T.Principal Component Analysis.2nd edition. New York:Spriger-Verlag,2002
    [83]Hyvarinen A. Survey on Independent Component Analysis. Neural Computing Surveys,1999,2 94-128.
    [84]Gorsuch R. L.Factor Analysis. Hillsdale, NJ:Lawrence Erlbaum,1974
    [85]Hastie T., Stuetzle W. Principal Curves. Journal of the American Statistical Association,1988,84 (406)502-516.
    [86]Seung H. S., Lee D. D. The Manifold Way of Perception. Science,2000,290 2268-2269.
    [87]Tenenbaum J. B., Dilva V. D., Langford J. A Global Geometric Gramework for Nonlinear Dimensionality Reduction. Science,2000,290 2319-2323.
    [88]Roweis S. T., Saul L. K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science,2000,290 2323-2326.
    [89]张军平.流行学习若干问题研究//王珏,周志华,周傲英.机器学习及其应用.北京：清华大学出版社,2006,135-169
    [90]Smola A. J., Mika S.Regularized Principal Manifolds. In:Proceedings of the 4th European Conference of Computational Learning Theory, Lecture Notes in Artificial Intelligence 1572 New York,1999Springer,251-256.
    [91]Belkin M., Niyogi P. Laplaian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation,2003,15 (6) 1373-1396.
    [92]Brans M.Charting a Manifold. In:Proceedings of the Neural Information Proceeding Systems: Natural and Synthetic Cancouver, Canada,2002
    [93]Chapelle O., Scholkopf B., Zien A. (eds.) Semi-Supervised Learning. MIT Press, London,2006.
    [94]Nigam K., Mccallum A. K., Thrun S., Mitchell T. M. Text Classification from Labeled and Unlabeled Documents Using Em. Machine Learning,2000,39 (2-3) 103-134.
    [95]Nigam K., Mccallum A., Mitchell T. M.Semi-Supervised Text Classification Using Em. In: Chapelle, O.,Scholkopf B.,Zien A. eds. Semi-Supervised Learning. London:MIT Press,2006, 33-55.
    [96]Joachims T.Transductive Inference for Text Classification Using Support Vector Machines. In: Proceedings of the 16th Int'l Conf Machine Learing Bled, Slovenia,1999200-209.
    [97]Joachims T.Semi-Supervised Text Classification Using Em. In; Chapelle, O.,Scholkopf B.,Zien A. eds. Semi-Supervised Learning. London:MIT Press,2006,105-117.
    [98]Corduneanu A.Data-Dependent Regularization. In:Chapelle, O.,Scholkopf B.,Zien A. eds. Semi-Supervised Learning. Cambridge:MIT Press,2006,169-190.
    [99]Zhou D., Scholkopf B.Discrete Regularization. In:Chapelle, O.,Scholkopf B.,Zien A. eds. Semi-Supervised Learning. London:MIT Press,2006,237-249.
    [100]Sajama, Orlitsky A.Modifying Distance. In:Chapelle, O.,Scholkopf B.,Zien A. eds. Semi-Supervised Learning. Cambridge:MIT Press,2006,309-330.
    [101]Blum A., Mitchell T. M.Combining Labeled and Unlabeled Data with Co-Training. In:Proceedings of the 11th Annual Conf Computational Learning Theory Madison, WI,199892-100.
    [102]Dietterich T., Lathrop R. H., Lozana-Perez T. Solving the Multiple-Instance Problem with Axis-Parallel Rectangles. Artificial Intelligence,1997,89 (1-2) 31-71.
    [103]Dzeroski S., Lavrac N.Ralational Data Mining. Springer,2001
    [104]Weierstrass.Uber Die Analytische Darstellbarkeit Sogenannter Willkurlicher Funktionen Einer Reellen Veranderlichen. Berlin,1885.633-639,789-905
    [105]Hecht-Nielson R.Kolmogorov's Mapping Neural Network Existence Theorem. In:Proceedings of the 1st IEEE International Conference on Neural Networks San Diego, CA,198711-14.
    [106]Cybenko. Approximation by Superpositions of Sigmoidal Function. Mathematics of Control, Signals, and Systems,1989,2303-314.
    [107]Poggio T., Girosi F. Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks. Science,1990,247 978-981.
    [108]Girosi F., Jones M., Poggio T. Regularization Theory and Neural Networks Architectures. Neural Computation,1995,7 219-269.
    [109]Vapnik V. N.Statistical Learing Theory. New York:John Wiley & Sons,1998
    [110]Amari S.Methods of Information Geometry. New York:Oxford University Press,2000
    [111]Wolpert D. H., Macready W. G.:No Free Lunch Theorems for Search (1995).
    [112]Wolpert D. H., Macready W. G. No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation,1997,1 (1)67-82.
    [113]Lancoz C.Linear Differential Operators. London:Von Nostrand,1964
    [114]丘成桐,孙理察.微分几何讲义.北京：高等教育出版社,2004
    [115]梅向明.微分几何.北京：高等教育出版社,1988
    [116]苏步青.微分几何.北京：高等教育出版社,1979
    [117]陈维桓.微分几何初步.北京：北京大学出版社,1990
    [118]陈省身.微分几何讲义.北京：北京大学出版社,1983
    [119]梁灿彬.微分几何入门与广义相对论.北京：北京师范大学出版社,2000
    [120]江泽涵.拓扑学引论.上海科学技术出版社,1978
    [121]Riemann B. clifford, w. k. On the Hypotheses Which Lie at the Bases of Geometry. Nature,1873,8 (183,184)14-17,36,37.
    [122]沈一兵.整体微分几何(第三版).北京：高等教育出版社,2009
    [123]Santalo L. A.Integral Geometry and Geometric Probability. Reading, Mass: Addison-Wesley,1979
    [124]Santalo L. A吴大任积分几何与几何概率.天津：南开大学出版社,1991
    [125]任德麟.积分几何学引论.上海科学技术出版社,1988
    [126]Spivak M.A Comprehensive Introduction to Differential Geometry. Boston:Publish or Perish,1979
    [127]茆诗松,王静龙,濮晓龙.高等数理统计.北京：高等教育出版社,2006
    [128]Amari S.-I. Natural Gradient Works Efficiently in Learning. Neural Computation,1998,10 251-276.
    [129]Amari S.-I. Information Geometry on Hierarchy of Probability Distributions. IEEE Transactions on Information Theory,2001,47(5) 1701-1711.
    [130]Amari S.-I., Kurata K., Nagaoka H. Information Geometry of Boltzmann Machines. IEEE Transactions on Neural Networks,1992,3 (2) 260-271.
    [131]Amari S.-I. Information Geometry of the Em and Em Algorithms for Neural Networks. Neural Networks,1995,8 (9) 1379-1408.
    [132]Jacobs A. M., Grainger J. Models of Visual Word Recognition- Sampling the State of the Art. Journal of Experimental Psychology:Human Perception and Performance,1994,29 1311-1334.
    [133]Popper K. R.The Logic of Scientific Discovery. New York:Basic Books,1959
    [134]Myung I. J., Pitt M. A.Model Comparison Methods. In:Brand, L.,Johnson M. L. eds. Methods in Enzymology. Elsevier,2004,351-366.
    [135]Poggio T., Bizzi E. Generalization in Vision and Motor Control. Nature,2004,431 768-774.
    [136]Tsodyks M., Gilbert C. Neural Networks and Perceptual Learning. Nature,2004,431 775-781.
    [137]Akaike H.Information Theory and an Extension of the Maximum Likelihood Principle. In: Proceedings of the Second International Symposium on Information Theory Budapest, 1973267-281.
    [138]Schwarz G. Estimation the Dimension of a Model. Annals of Statistics,1978,7(2) 461-464.
    [139]Kunz M., Trotta R., Parkinson D. R. Measuring the Effective Complexity of Cosmological Models. Physical Review D,2006,74 (2).
    [140]Cavagnaro D. R., Myung J. I., Pitt M. A., Kujala V. Adaptive Design Optimization:A Mutual Information-Based Approach to Model Discrimination in Cognitive Science. Neural Computation, 2010,22887-905.
    [141]Rissanen J. Universal Coding, Information, Prediction, and Estimation. IEEE Transaction on Information Theory,1984,30 (4) 629-636.
    [142]Rissanen J. Fisher Information and Stochastic Complexity. IEEE Transaction on Information Theory,1996,42(1)40-47.
    [143]Myung J. I., Navarrob D. J., Pitt M. A. Model Selection by Normalized Maximum Likelihood. Journal of Mathematical Psychology,2006,50167-179.
    [144]Wu H., Myung J. I., Batchelder W. H. On the Minimum Description Length Complexity of Multinomial Processing Tree Models. Journal of Mathematical Psycholog,2010,54 291-303.
    [145]Foster D., George E. The Risk Inflation Criterion for Multiple Regression. Annals of Statistics, 1994,22 1947-1975.
    [146]Vanpaemel W. Measuring Model Complexity with the Prior Predictive. Advances in Neural Information Processing Systems,2009.
    [147]Browne M. W. Cross-Validation Methods. Journal of Mathematical Psychology,2000,44(1) 108-132.
    [148]Efron B. Computers and the Theory of Statistics: Thinking the Unthinkable. SIAM Review, 1979,21 460-480.
    [149]Chien J.-T., Furui S. Predictive Hidden Markov Model Selection for Speech Recognition. IEEE Transactions on Speech and Audio Processing,2005,13 (3) 377-387.
    [150]Cavagnaro D. R., Pitt M. A. Model Disccrimination through Adaptive Experimentation. Psychonomic Bulletin & Review,2011,18(1)204-210.
    [151]Cherkassky V., Mulier F.Learning Form Data:Concepts, Theory, and Methods. New Yourk:Wiley, 1998
    [152]Ripley B.Pattern Recognition and Neural Networks. Cambridge, UK:Cambridge University Press, 1996
    [153]Bishop C. M.Neural Networks for Pattern Recognition. Oxford:Clarendon Press,1995
    [154]Liu C. C., Aitkin M. Prior Sensitivity and Model Generalizability. Journal of Mathematical Psychology,2008,53 362-375.
    [155]Myung I. J., Balasubramanian V., Pitt M. A. Counting Probability Distributions:Differential Geometry and Model Selection. PNAS,2000,97 (21) 11170-11175.
    [156]Balasubramanian V. Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions. Neural Computation,1997,9 (2) 349-368.
    [157]Pitt M. A., Myung J. I., Montenegro M., Pooley J. Measuring Model Flexibility with Parameter Space Partitioning: An Introduction and Application Example. Cogntive Science,2008,32 1285-1303.
    [158]Myung J. I., Tang Y., Pitt M. A.Evaluation and Comparison of Computational Models., 2009.287-309
    [159]J.B.Keller. Inverse Problems. Ams.Math Monthly,1976,83107-108.
    [160]Engl H. W., Hanke M., Neubauer A.Regularization of Inverse Problems. Dordrecht Boston:Kluwer Academic Publishiers,2000
    [161]Aster R. C., Borchers B., Thurber C. Parameter Estimation and Inverse Problems,2004.
    [162]Pizlo Z. Perception Viewed as an Inverse Problem. Vision Research,2001,413145-3161.
    [163]Myung I. J., Pitt M. A., Navarro D. J.Model Selection in Cognitive Science as an Inverse Problem. In:Proceedings of the SPIE,2005219-228.
    [164]Schuurmans D., Southey F. Metric-Based Methods for Adaptive Model Selection and Regularization. Machine Learning,2002,48(1)51-84.
    [165]Szummer M., Jaakkola T. Information Regularization with Partially Labeled Data.
    [166]Rao C.R. Information and Accuracy Attainable in the Estimation of Statistical Parameters. Bulletin of the Calcutta Mathematical Society,1945,37 81-91.
    [167]Chentsov N. N.Statistical Decision Rules and Optimal Inference. Providence, R.I.:AMS,1982
    [168]Efron B. Defining the Curvature of a Statistical Problem. Annals of Statistics,1975,3 (3) 1189-1242.
    [169]Lee M. D. Modeling Individual Differences in Cognition. Psychonomic Bulletin & Review, 2005,12(4)605-621.
    [170]Zhu H., Rohwer R:Information Geometric Measurements of Generalization(1995).
    [171]Do Carmo M. P.Differential Geometry of Curves and Surfaces. Beijing:China Machine Press,2004
    [172]Wang S., Shi J., Chen C., Li Y.Direction-Basis-Function Neural Networks. In:Proceedings of the International Joint Conference on Neural Networks Washington D.C, USA,19991251-1254.
    [173]Bates D. Relative Curvature Measures of Nonlinearity. Journal of the Royal Statistical Society, 1980,42(1) 1-25.
    [174]Jennrich R. I. Asymptotic Properties of Nonlinear Least Squares Estimators. Annals of Math, 1969,40(3)633-643.
    [175]Beale E. M. L. Confidence Regions in Nonlinear Estimation. Journal of the Royal Statistical Society,1960,B22 (1)41-88.
    [176]Ratkowsky D. A.Nonlinear Regressiion Modeling. New York:Marcel Dekker, Inc,1983
    [177]Wei B. C.Modern Nonlinear Regression Analysis. Nanjing, China:Southeast University,1989
    [178]Kanatani K. Geometric Information Criterion for Model Selection. International Journal of Computer Vision,1998,26(3) 171-189.
    [179]Ziang Lv, Siwei Luo, Yunhui Liu, Yu Zheng.A New Geometric Approach to the Complexity of Model Selection. In:Proceedings of the IEEE International Conference on Cognitive Informatics Beijing China,2006IEEE Computer Society,268-273.
    [180]Zou Q. Research on Computational Model of Contour Grouping and Attention Model. Ph.D thesis. Beijing Jiaotong University, Beijing, China,2006
    [181]Dawid A. P. Discussion to Efron's Paper. Annals of Statistics,1975,3 1231-1234.
    [182]Barndorff-Nielsen O.Information and Exponential Families in Statistical Theory. John Wiley & Sons,1978
    [183]Amari S.-I. Differential Geometry of Curved Exponential Families-Curvature and Information Loss. Annals of Statistics,1982,10357-385.
    [184]陈锡驹,斯廷路德chinn, w. g.,n.e.steenrod拓扑学的首要概念一一线段、曲线、圆周与圆片的映射的几何学.上海：上海科学技术出版社,1984
    [185]王敬庚.直观拓扑.北京：北京师范大学出版社,1995
    [186]Penrose R.The Road to Reality:A Complete Guide to the Laws of the Universe.湖南科学技术出版社,2004
    [187]Klein F. Vergleichende Betrachtungen ber Neuere Geometrische Forschungen (a Comparative Review of Recent Researches in Geometry). Mathematische Annalen,1893,4363-100.
    [188]干丹岩.代数拓扑和微分拓扑简史.长沙：湖南教育出版社,2001
    [189]杨忠道.浅论点集拓扑曲面和微积分拓扑.湖南教育出版社,1998
    [190]姜伯驹.绳圈的数学.湖南教育出版社1999
    [191]徐森林,薛春华,胡自胜,金亚东.近代微分几何一一谱理论与等谱问题、曲率与拓扑不变量.合肥：中国科学技术大学出版社,2009
    [192]吕子昂,罗四维,杨坚,刘蕴辉,邹琪.模型的固有复杂度和泛化能力与几何曲率的关系.计算机学报,2007,30(7).
    [193]Asuncion A., Newman D. J.Uci Machine Learning Repository,2007. Available from: http://www.ics.uci.edu/～mlearn/MLRepository.html
    [194]Bates D., Watts D. G.Nonlinear Regression Analysis and Its Applications. John Wiley & Sons, Inc., 1988
    [195]Gibson E. J. Perceptual Learning. Annu. Rev. Psychol.,1963,1429-56.
    [196]Fahle M., Poggio T.Perceptual Learning. Mit Press,2002
    [197]Wiesel T. N., Hubel D. H. Comparison of the Effects of Unilateral and Bilateral Eye Closure on Cortical Unit Response in Kittens. Journal of Neurophysiology,1965,28 1029-1040.
    [198]Marr D.Vision. W. H. Freeman and Company,1982
    [199]Fiorentini, Berardi. Perceptual Learning Specific for Orientation and Spatial Frequency. Nature, 1980,2874-44.
    [200]Chen L. Topological Structure in Visual Perception. Science,1982, (218) 699-700.
    [201]Chen L. The Topological Approach to Perceptual Organization. Visual Cognition,2005,12 553-637.
    [202]Sigman M., Pan H., Yang Y., Stern E., Silbersweig D., Gilbert C. D. Top-Down Reorganization of Activity in the Visual Pathway after Learning a Shape Indentification Task. Neuron,2005,46 823-835.
    [203]史忠植.展望智能科学.信息技术快报,2003,1(3)1.
    [204]Palmer S. E.Modern Theories of Gestalt Perception. Understanding Vision. Blackwell,1992.
    [205]Ziang Lv, Siwei Luo, Yunhui Liu, Yu Zheng. Perceptual Learning Inspired Model Selection Method of Neural Networks. In:Proceedings of the International Conference ICNC 2006 INCS 4221 Xi'an China,2006Springer,39-42.
    [206]Shiffrin R. M., Lee M. D., Wagenmakers E.-J. A Survey of Model Evaluation Approaches with a Tutorial on Hierarchical Bayesian Methods. Cognitive Science,2008,32 (8) 1248-1284.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700