多视图的半监督学习研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

多视图的半监督学习研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Semi-Supervised Learning with Multiple Views
作者：王娇
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：人工智能 ; 机器学习 ; 半监督学习 ; 多视图学习 ; 正则化 ; 主动学习
英文关键词：Artificial intelligence ; Machine Learning ; Semi-supervised learning ; Multi-view learning ; Regularization ; Active learning
学位年度：2010
导师：罗四维
学科代码：081203
学位授予单位：北京交通大学
论文提交日期：2010-06-01
答辩委员会主席：史忠植

摘要

学习是人类具有的一种重要智能行为,模仿人类的学习过程是机器学习的主要目标。机器学习根据生理学、认知科学等对人类学习机理的研究成果,建立人类学习过程的计算模型,研究通用的学习算法,是人工智能和神经计算的核心研究内容之一。
     基于数据的机器学习从观测数据中构建模型,以对无法观测的数据或未见数据进行预测。随着信息时代的到来,数据大量存在,但获取数据的标记需要耗费人力物力。这里的“标记”是指数据所对应的输出,如在分类问题中标记就是数据的类别。传统的监督学习方法从有标记的数据中构建模型,当有标记的数据较少时所训练出的学习系统很难具有好的性能。半监督学习研究当有标记的数据较少时如何利用大量的未标记数据来改善学习性能,具有广泛的应用领域,是当前机器学习研究的热点问题之一。
     在机器学习的许多实际问题中数据有多个视图,如何综合利用数据的多个视图进行学习是具有挑战性的研究内容。本文研究多视图的半监督学习,对多视图半监督学习中的学习理论、学习算法、以及多视图的构造等关键问题进行了深入的研究,取得了一定的研究成果,并经过充分的实验验证,为进一步的研究和应用奠定了基础。
     本文创造性的研究成果主要有：
     1.提出一种多视图半监督学习中的正则化方法。从有限样本中学习往往是病态逆问题,解决的办法是对学习过程加以限制,这个过程称为正则化。针对多视图的半监督学习,利用假设空间的度量结构,定义学习函数的光滑性和一致性。在每个视图内的学习过程中限制函数的光滑性,在多个视图的协同学习过程中限制函数的一致性。提出一种两个层次的正则化算法,同时使用函数的光滑性和一致性进行正则化,并对算法预测误差进行理论分析。实验表明,该算法较仅使用光滑性或仅使用一致性的正则化方法在预测性能上有显著提高。
     2.提出一种基于图的多视图半监督学习方法。分析图表示法的适用性,使用多个图结构表示多视图数据,将基于图的半监督学习扩展到数据有多个视图的情况。提出一种多个图的半监督学习算法,在每个图上进行半监督学习,并在多个图上协同学习,从而同时优化多个图上的学习器。从概率角度分析多个图上的学习过程。实验表明,该算法较单个图上的半监督学习算法有更高的分类精度。
     3.提出一种随机子空间中的多视图构造及学习方法。在数据的特征空间中取随机子空间,将数据映射到多个随机子空间中,以构造数据的多个视图。提出一种随机子空间中的多视图半监督学习算法,将每个视图上的学习器预测置信度最高的未标记数据用于训练其它视图上的学习器,从而使各个视图上的学习器协同训练。使用随机判别理论对算法进行分析。实验表明,该算法在数据特征较多时较同类算法有更好的预测性能。
     4.提出一种排除学习器不确定性的主动学习方法,并将其与多视图半监督学习结合。运用主动学习思想选取学习器最不置信的未标记数据作为需要查询的数据。在每个视图内的学习过程中,对于最置信的未标记数据,将其用于训练其它视图上的学习器；对于最不置信的未标记数据,向外界查询它的标记。实验表明,该算法能够显著提高学习性能。
Learning from examples is an important ability of human beings. The goals of machine learning is to simulate the learning process of human. By applying the research results of neurophysiology and cognitive psychology to construct the computational models and algorithms, machine learning aims to predict the unseen examples, which is an important part of artificial intelligence and neural computing.
     With the development of information technology, there are abundant unlabeled examples while the number of labeled examples is limited, because labeling the examples requires human efforts. The word "label" indicates the desired output of the example, e.g. in classification it indicates the category of the example. Traditional supervised learning needs a large number of labeled examples to construct the model, which has poor performance when the label is scare. So, semi-supervised learning which exploits unlabeled examples in addition to labeled examples to improve learning performance has been a hot topic recently.
     Many problems in machine learning involve examples that are naturally comprised of multiple views. In this dissertation, several key problems of exploiting multiple views to effectively learn from labeled and unlabeled examples are studied, which include the theory and the algorithm of multi-view semi-supervised learning, the construction of multiple views, and the combination of multi-view semi-supervised learning with active learning. The methods and technologies proposed in this dissertation are verified through sufficient experiments.
     The main contributions of this dissertation are summarized as follows:
     1. We propose a new regularization method in multi-view semi-supervised learning. Learning from limited examples is an ill-posed inverse problem, to which regularization method has to be used. By exploiting the metric structure of the hypotheses space, we define the smoothness and consistency of a hypothesis. A two levels regularization algorithm is presented which uses the smoothness to regularize the within-view learning process while uses the consistency to regularize the between-view learning process. The prediction error of the algorithm is analyzed. Encouraging experimental results are presented on both synthetic and real world datasets.
     2. We propose a new graph-based multi-view semi-supervise learning method. As graph can be used to represent the examples and the relationship between examples, multiple graphs can be used to represent multi-view examples. By extending graph-based semi-supervise learning to solve the multi-view learning problem, a semi-supervised learning algorithm with multi-graph is presented, which using unlabeled examples to learning in each graph while using unlabeled examples to co-learning between graphs. The experimental results on real world dataset show that our method is more accurate comparing with graph-based single-view semi-supervised learning methods.
     3. We propose a new multi-view construction method. By projecting examples into the random subspaces of the feature space, we construct views of the original examples. A multi-view semi-supervised learning algorithm is presented, which trains a classifier in each view and chooses the most confident examples of each classifier to train the other classifiers. Random discrimination theory is used to analyze the performance of the algorithm. The experimental results on real world datasets show that our method is effective when the feature are abundant.
     4. We propose a new active learning method and combine it with the multi-view semi-supervised learning method. When the learner can interact with the environment, it can choose some examples to query their labels from the user. By selecting the example nearest to the classification hyperplane, we present an active learning algorithm which ask the user to label the least confident examples of the learner. Then, we incorporate the active learning process into the multi-view semi-supervised learning process. For each view, the most confident examples are selected to enlarge the training set of the other view, while the least confident examples are selected to query. The experimental results on both synthetic and real world datasets demonstrate that the classification performance can be improved distinctly with the proposed active learning method.

引文

[1]H. A. Simon. Bounded rationality and organizational learning. Organization Science.1991.2 (1). 125-134
    [2]F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review.1958.65 (6).386-408
    [3]V. N. Vapnik. Statistical Learning Theory. New York. Wiley-Interscience.1998
    [4]A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. Proceedings of the 11th Annual Conference on Learning Theory. Madison, WI.1998.92-100
    [5]M. Collins and Y. Singer. Unsupervised models for named entity classification. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. College Park, MD.1999.100-110
    [6]K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. Proceedings of the 9th ACM International Conference on Information and Knowledge Management. Washington, DC.2000.86-93
    [7]R. Ghani. Combining labeled and unlabeled data for text classification with a large number of categories. Proceedings of IEEE Conference on Data Mining.2001
    [8]I. Muslea, S. Minton, and C. A. Knoblock. Active+semi-supervised learning = robust multi-view learning. Proceedings of the 19th International Conference on Machine Learning. 2002.435-442
    [9]X.-j. Zhu. Semi-supervised learning literature survey. Technical Report. Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI.2008
    [10]O. Chapelle, B. Scholkopf, and A. Zien. Semi-Supervised Learning. Cambridge, MA. MIT Press.2006
    [11]X. Zhu, T. Rogers, R. Qian, and C. Kalish. Human perform semi-supervised classification too. Proceedings of the 22nd Conference on Artificial Intelligence.2007
    [12]B. Shahshahani and D. Landgrebe. The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing.1994.32 (5).1087-1095
    [13]R. P. Lippmann. Pattern classification using neural networks. IEEE Communications.1989.27 (11).47-64
    [14]D. J. Miller and H. S. Uyar. A mixture of experts classifier with learning based on both labelled and unlabelled data. Advances in Neural Information Processing Systems 9. Cambridge, MA: MIT Press.1997.571-577
    [15]周志华.半监督学习中的协同训练风范.见：周志华,王珏主编φ机器学习及其应用.北京：清华大学出版社.2007.259-275
    [16]T. Zhang and F. J. Oles. A probability analysis on the value of unlabeled data for classification problems. Proceedings of the 17th International Conference on Machine Learning. San Francisco, CA.2000.1191-1198
    [17]K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM Machine Learning.2000.39 (2-3).103-134
    [18]J. Ratsaby and S. Venkatesh. Learning from a mixture of labeled and unlabeled examples with parametric side infomation. Proceedings of the Eighth Annual Conference on Computational Learning Theory.1995.412-417
    [19]V. Castelli and T.Cover. The exponential value of labeled samples. Pattern Recognition Letters. 1995.16(1).105-111
    [20]A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society.1977.39 (1).1-38
    [21]A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. Proceedings of the AAAI-98 workshop on learning for text categorization.1998
    [22]F. G. Cozman, I. Cohen, and M. C. Cirelo. Semi-supervised learning of mixture models. Proceedings of the 20th International Conference on Machine Learning. Washington, DC, USA. 2003. AAAI Press.99-106
    [23]D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd annual meeting of the Association of Computational Linguistics. Cambridge, MA.1995.189-196
    [24]E. Riloff, J.Wiebe, and T.Wilson. Learning subjective nouns using extraction pattern bootstrapping. Proceedings of the 7th Conference on Natural Language Learning.2003
    [25]C. Rosenberg, M. Hebert, and H. Schneiderman. Semi-supervised self-training of object detection models. Proceedings of the 7th IEEE Workshop on Applications of Computer Vision. 2005
    [26]S. Dasgupta, M. L. Littman, and D. McAllester. PAC generalization bounds for co-training. Advances in Neural Information Processing Systems 14.2002. Cambridge, MA:MIT Press.
    375-382
    [27]M.-F. Balcan, A. Blum, and K. Yang. Co-training and expansion: towards bridging theory and practice. Advances in Neural Information Processing Systems 17. Cambridge, MA.2005. MIT Press.89-96
    [28]W. Wang and Z.-H. Zhou. Analyzing co-training style algorithms. Proceedings of the 18th European Conference on Machine Learning. Warsaw, Poland.2007
    [29]U. Brefeld and T. Scheffer. Co-EM support vector learning. Proceedings of the 21st International Conference on Machine Learning.2004.121-128
    [30]S. Goldman and Y. Zhou. Enhancing supervised learning with unlabeled data. Proceedings of the 17th International Conference on Machine Learning. San Francisco, CA.2000.327-334
    [31]Y. Zhou and S. Goldman. Democratic co-learning. Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence. Boca Raton, FL.2004.594-602
    [32]Z.-H. Zhou and M. Li. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering.2005.17 (11).1529-1541
    [33]M. Li and Z.-H. Zhou. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics - Part A: Systems and Humans.2007.37 (6).1088-1098
    [34]M. Li and Z.-H. Zhou. SETRED: Self-training with editing. Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hanoi, Vietnam.2005.611-621
    [35]邓超,郭茂祖.基于自适应数据剪辑策略的Tri-training算法.计算机学报.2007.30(8).1213-1226
    [36]C.M. Christoudias, R. Urtasun, and T. Darrell. Multi-view learning in the presence of view disagreement. Proceedings of the Uncertainty in Artificial Intelligence.2008
    [37]O. Chapelle and A. Zien. Semi-supervised classification by low density separation. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics.2005.57-64
    [38]T. Joachims. Transductive inference for text classification using support vector machines. Proceedings of the 16th International Conference on Machine Learning. Bled, Slovenia.1999. 200-209
    [39]K. Bennett and A. Demiriz. Semi-supervised support vector machines. Advances in Neural Information Processing Systems.1999.368-374
    [40]G. Fung and O. Mangasarian. Semi-supervised support vector machines for unlabeled data classification. Technical Report. Data Mining Institute, University of Wisconsin Madison.1999
    [41]T. De Bie and N. Cristianini. Convex methods for transduction. Advances in neural information processing systems 16. Cambridge, MA:MIT Press.2004
    [42]L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class support vector machines. Proceedings of the 20th National Conference on Artificial Intelligence.2005
    [43]V. Sindhwani, S. S. Keerthi, and O. Chapelle. Deterministic annealing for semi-supervised kernel machines. Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, USA.2006.841-848
    [44]R. Collobert, J. Weston, and L. Bottou. Trading convexity for scalability. Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, USA.2006
    [45]V. Sindhwani and S. S. Keerthi. Large scale semi-supervised linear SVMs. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval Seattle, Washington, USA.2006.477-484
    [46]O. Chapelle, V. Sindhwani, and S. S. Keerthi. Branch and bound for semi-supervised support vector machines. Advances in Neural Information Processing Systems.2006
    [47]N. D. Lawrence and M. I. Jordan. Semi-supervised learning via Gaussian processes. Advances in neural information processing systems 17. Cambridge, MA:MIT Press.2005
    [48]W. Chu and Z. Ghahramani. Gaussian processes for ordinal regression. Technical Report. University College London.2004
    [49]W. Chu, V. Sindhwani, Z. Ghahramani, and S. S. Keerthi. Relational learning with gaussian processes. Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA. 2006.289-296
    [50]V. Sindhwani, W. Chu, and S. S. Keerthi. Semi-supervised gaussian process classifiers. Proceedings of the International Joint Conference on Artificial Intelligence.2007
    [51]X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. Proceedings of the 20th International Conference on Machine Learning. Washington, DC.2003.912-919
    [52]M. Szummer and T. Jaakkola. Partially labeled classification with markov random walks Advances in Neural Information Processing Systems 14. Cambridge.2002. MIT Press.945-952
    [53]A. Corduneanu and T. Jaakkola. On information regularization. Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence.2003
    [54]A. Corduneanu and T. Jaakkola. Distributed information regularization on graphs. Advances in neural information processing systems 17. Cambridge, MA:MIT Press.2005
    [55]Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. Advances in Neural Information Processing Systems 17.2005. MIT Press
    [56]X.-j. Zhu. Semi-Supervised Learning with Graphs [Dissertation]. Carnegie Mellon University. 2005
    [57]C. H. Lee, S. Wang, et al. Learning to model spatial dependency: Semi-supervised discriminative random fields. Advances in Neural Information Processing Systems 19.2006
    [58]X. Zhu and Z. Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report. Technical report. Carnegie Mellon University.2002
    [59]A. Blum arid S. Chawla. Learning from labeled and unlabeled data using graph mincuts. Proceedings of the 18th International Conference on Machine Learning. Williamston, MA. 2001.19-26
    [60]C. H. Papadimitriou and K. Steiglitz. The Max-Flow, Min-Cut Theorem. inCombinatorial Optimization: Algorithms and Complexity.1998.120-128
    [61]D. Zhou, O. Bousquet, et al. Learning with local and global consistency Advances in Neural Information Processing Systems 16.2004. MIT Press, Cambridge, MA.321-328
    [62]M. Belkin and P. Niyogi. Semi-supervised learning on riemannian manifolds. Machine Learning. 2004.56 (1-3).209-239
    [63]L. Zhao, S. Luo, et al. Regularized Semi-supervised Classification on Manifold. Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining.2006.20-29
    [64]V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semi-supervised learning. Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany.2005.824-831
    [65]M. Belkin, P. Niyogi, and V. Sindhwani. On manifold regularization. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. Savannah Hotel, Barbados. 2005.17-24
    [66]K. Nigam and R. Ghani. Understanding the behavior of co-training. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2000
    [67]T. Joachims. Text categorization with support vector machines: learning with many relevant features. Proceedings of the European Conference on Machine Learning.1998
    [68]J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.2000.22 (8).888-905
    [69]J. Wang and C. Zhang. Linear Neighborhood Propagation and Its Applications. IEEE
    Transactions on Pattern Analysis and Machine Intelligence.2008
    [70]L. Grady. Random Walks for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.2006.28 (11).1768-1783
    [71]C. Rother, V. Kolmogorov, and A. Blake. "GrabCut": Interactive Foreground Extraction using Iterated Graph Cuts. ACM Transactions on Graphics.2004.23 (3).309-314
    [72]Y. Li, J. Sun, C. K. Tang, and H. Y. Shum. Lazy Snapping. ACM Transactions on Graphics. 2004.23 (3).303-308
    [73]E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. Proceedings of the 16th National Conference on Artificial Intelligence. Orlando, FL.1999.474-479
    [74]D. Pierce and C. Cardie. Limitations of co-training for natural language learning from large datasets. Proceedings of the Empirical Methods in Natural Language Processing. Pittsburgh, PA. 2001.1-9
    [75]a. Sarkar. Applying co-training methods to statistical parsing. Proceedings of the 2nd Annual Meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh, PA.2001.95-102
    [76]M. Steedman, M. Osborne, et al. Bootstrapping statistical parsers from small data sets. Proceedings of the 11th Conference on the European Chapter of the Association for Computational Linguistics. Budapest, Hungary.2003.331-338
    [77]R. Hwa, M. Osborne, A. Sarkar, and M. Steedman. Corrected co-training for statistical parsers. Proceedings of Working Notes of the ICML’03 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. Washington, DC.2003
    [78]Z.-H. Zhou, K.-J. Chen, and Y. Jiang. Exploiting unlabeled data in content-based image retrieval. Proceedings of the 15th European Conference on Machine Learning. Pisa, Italy.2004.525-536
    [79]Z.-H. Zhou, K.-J. Chen, and H.-B. Dai. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems.2006.24 (2).219-244
    [80]B. Love. Comparing supervised and unsupervised category learning. Psychonomic Bulletin & Review.2002.9 (4).829-835
    [81]S. B. Stromsten. Classification learning from both classified and unclassified examples [Dissertation]. Stanford.2002
    [82]S. Smale and D. Zhou. Shannon sampling and function reconstruction from point values. Bulletin of the American Mathematical Society.2004.41 (3).279-305
    [83]D. Chen, Q. Wu, Y. Ying, and D. Zhou. Support vector machine soft margin classifiers: Error analysis. Journal of Machine Learning research.2004 (5).1143-1175
    [84]A. N. Tikhonov and V. Y. Arsenin. Solutions of ill posed problems.1977
    [85]J. Kaipio and E. Somersalo. Statistical and computational inverse problems. Springer.2005
    [86]T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi. General conditions for predictivity in learning theory. Nature.2004.428.419-422
    [87]T. Zhang, A. Popescul, and B. Dom. Linear prediction models with graph regularization for web-page categorization. Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. Philadelphia, PA, USA 2006. ACM Press.821-826
    [88]D. Schuurmans and F. Southey. Metric-based methods for adaptive model selection and regularization. Machine Learning.2002.48 (1-3).51-84
    [89]V. d. Sa. Learning classification with unlabeled data. Advances in Neural Information Processing Systems 6.1994.112-119
    [90]S. Abney. Bootstrapping. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, PA.2002.360-367
    [91]V. Sindhwani, P. Niyogi, and M. Belkin. A co-regularized approach to semi-supervised learning with multiple views. Proceedings of Working Notes of the ICML'05 Workshop on Learning with Multiple Views. Bonn, Germany.2005
    [92]U. Brefeld, T. G"artner, T. Scheffer, and S. Wrobel. Efficient co-regularised least squares regression. Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, PA.2006.137-144
    [93]S. Yu, B. Krishnapuram, et al. Bayesian Co-training. Advances in Neural Information Processing Systems 19.2007
    [94]J. D. R. Farquhar, D. Hardoon, et al. Two view learning: SVM-2K, theory and practice. Advances in Neural Information Processing Systems 18.2006
    [95]S. Szedmak and J. Shawe-Taylor. Synthesis of maximum margin and multi-view learning using unlabeled data. Proceedings of the European Symposium on Artificial Neural Networks.2006
    [96]D. Rosenberg and P. L. Bartlett. The rademacher complexity of co-regularized kernel classes. Proceedings of the 11th International Conference on Artificial Intelligence and Statistics.2007
    [97]P. Luo, F. Zhuang, et al. Transfer learning from multiple source domains via consensus regularization. Proceedings of the 17th ACM SIGKDD Conference on Information and Knowledge Management Napa Valley, California, USA 2008.103-112
    [98]A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering.1996. Available from: http://www.cs.cmu.edu/>>mccallum/bow
    [99]Available from: http://www-stat-class.stanford.edu/-tibs/ElemStatLearn/data.html
    [100]于剑.聚类分析的新进展——谱聚类综述.见：周志华,王珏主编φ机器学习及其应用.北京：清华大学出版社.2007.149-165
    [101]I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. San Francisco, California.2001.269-274
    [102]V. R. d. Sa. Spectral clustering with two views. Proceedings of Working Notes of the ICML'05 Workshop on Learning with Multiple Views. Bonn, Germany.2005
    [103]L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford University, Stanford, CA.1998
    [104]刘铁岩,高斌.高阶异构数据挖掘.见：周志华,王珏主编φ机器学习及其应用.北京：清华大学出版社.2007.28-48
    [105]H. Hotelling. Relations between two sets of variates. Biometrika.1936.28.312-377
    [106]D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation.2004.16(12).2639-2664
    [107]T. Melzer, M. Reiter, and H. Bischof. Appearance models based on kernel canonical correlation analysis. Pattern Recognition.2003.36 (9).1961-1971
    [108]孙廷凯,陈松灿.典型相关分析研究进展.见：周志华,王珏主编φ机器学习及其应用.北京：清华大学出版社.2007.85-108
    [109]彭岩,张道强.半监督典型相关分析算法.软件学报.2008.19(11).2822-2832
    [110]Z.-H. Zhou, D.-C. Zhan, and Q. Yang. Semi-supervised learning with very few labeled training examples. Proceedings of the 22nd AAAI Conference on Artificial Intelligence. Vancouver, Canada.2007.675-680
    [111]S. M. Kakade and D. P. Foster. Multi-view regression via canonical correlation analysis. Proceedings of the 20th Annual Conference on Learning Theory.2007.82-96
    [112]D. Zhou and B. Scholkopf. Learning from labeled and unlabeled data using random walks. Proceedings of the 26th German Association for Pattern Recognition DAGM Symposium. Berlin, Germany.2004. Springer.237-244
    [113]F. R. K. Chung. Spectral Graph Theory. American Mathematical Society.1997
    [114]S. Zhu, K. Yu, Y. Chi, and Y. Gong. Combining content and link for classification using matrix factorization. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Amsterdam, The Netherlands.2007.487-494
    [115]D. Zhou and C. J. C. Burges. Spectral clustering and transductive learning with multiple views. Proceedings of the 24th International Conference on Machine Learning.2007
    [116]A. Argyriou, M. Herbster, and M. Pontil. Combining graph laplacians for semi-supervised learning. Advances in Neural Information Processing Systems 17.2005
    [117]K. Tsuda, H. Shin, and B. Scholkopf. Fast protein classification with multiple networks. Bioinformatics.2005.21 (2).59-65
    [118]A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the contruction of internet portals with machine learning. Information Retrieval Journal.2000.3.127-163
    [119]E. M. Kleinberg. Stochastic discrimination. Annals of Mathematics and Artificial Intelligence. 1990(1).207-239
    [120]E. M. Kleinberg. An overtraining-resistant stochastic modeling method for pattern recognition. Annals of Statistics.1996.4 (6).2319-2349
    [121]E. M. Kleinberg. On the algorithmic implementation of stochastic discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence.2000.22 (5).473-490
    [122]T. K. Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence.1998.20 (8).832-844
    [123]R. K. Ando and T. Zhang. Two-view feature generation model for semi-supervised learning. Proceedings of the 24th International Conference on Machine Learning. New York, USA. 2007.25-32
    [124]B. Raskutti, H. Ferra, and A. Kowalczyk. Combining clustering and co-training to enhance text classification using unlabelled data. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2002.620-625
    [125]D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning Proceedings of the 11th International Conference on Machine Learning.1994.148-156
    [126]N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. Proceedings of the 18th International Conference on Machine Learning.2001. Morgan Kaufmann, San Francisco, CA.441-448
    [127]D. A. Cohn, L. Atlas, and R. E. Ladner. Improving generalization with active learning. Machine Learning.1994.1 (2).201-221
    [128]Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm Machine Learning.1997.28 (2).133-168
    [129]A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification Proceedings of the 15th International Conference on Machine Learning.1998. Morgan Kaufmann Publishers, San Francisco, US.359-367
    [130]X. Zhu, J. Lafferty, and Z. Ghahramani. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. Proceedings of the ICML Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining.2003
    [131]W. Wang and Z.-H. Zhou. On multi-view active learning and the combination with semi-supervised learning. Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland.2008
    [132]A. Asuncion and D. J. Newman. UCI Machine Learning Repository.2007. Available from: http://www.ics.uci.edu/-mlearn/MLRepository.html
    [133]R. Quinlan. C4.5:Programs for Machine Learning. Morgan Kaufmann.1993
    [134]I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann.1999
    [135]K. Lang. Newsweeder: Learning to filter netnews. Proceedings of the International Conference on Machine Learning.1995.331-339

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700