基于正则化线性统计模型的文本分类研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于正则化线性统计模型的文本分类研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Text Categorization Based on Regularized Linear Models
作者：郑文斌
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：文本分类 ; 正则化 ; 线性模型 ; 降维 ; 非负矩阵 ; 语义 ; 稀疏约束 ; 多标签 ; 极限学习机
英文关键词：Text Categorization ; Regularization ; Linear Model ; Dimensionality Re-
英文关键词：duction ; Non-negative Matrix ; Semantic ; Sparse Constraint ; Multi-Label ; Extreme
英文关键词：Learning Machine
学位年度：2012
导师：钱沄涛
学科代码：0812
学位授予单位：浙江大学
论文提交日期：2012-06-01

摘要

文本是信息最基本、最常用的载体之一,随着信息技术的飞速发展,文本信息迅速膨胀。如何有效地组织和管理这些海量信息,并且能够从中快速、准确、全面地找到所需要的信息是当前信息科学与技术领域面临的一大挑战。文本分类技术是组织和管理文本信息的有力手段,也是信息检索和数据挖掘的重要基础。
     本文在分析文本分类相关研究的基础上,结合正则化线性统计模型的思想及其发展,从特征降维及表达、分类器的快速学习、以及降维和分类一致性模型等方面展开研究,完成了如下的一系列工作：
     1.提出了一种基于类别信息融合的非负矩阵分解的文本降维算法。针对传统的非负矩阵分解在实现降维时难以利用多标签类别信息的情况,通过类别编码并扩展维数的方式实现将类别信息融入矩阵分解,从而达到提高系统抗干扰能力并增强基的判别性的目的。之后通过对矩阵分解施加约束项驱使基向量正交归一化以减少其冗余信息。最后通过矩阵裁剪及变换实现了将文本数据从高维项空间映射到由一组非负基向量张成的低维语义子空间的降维目的。实验结果表明,该方法提高了基的判别能力,在维数降到很低情况下仍然获得很好的分类性能。
     2.提出了一种面向文本分类的非负稀疏语义编码算法。针对常见的降维方法产生的稠密表达与常识不符,以及通常的稀疏表达方法耗时且可能存在负元素(难以解释文本语义)等问题,本文开发了一个高效的字典构造算法,该字典包含的一组非负基向量可以张成一个语义子空间,在其中,所有的文本被表示为非负稀疏形式,这种编码方式符合一篇文档通常只包含不多的语义概念的实际情况。实验结果表明,该方法不仅达到了很好的分类性能,而且也获得了较好的可解释性。
     3.提出了一种基于极限学习机的文本分类算法。极限学习机是近年来快速发展的一种机器学习方法,其模型通常可以通过解析方式获得,避免了模型学习过程中常见的收敛性问题,从而达到很高的学习速度。本文针对极限学习机应用在高维稀疏文本数据上的一些问题,构建了一个正则化极限学习机模型,并给出其相应的解析解和理论证明以保证解的存在性。之后,根据模型的结构特点给出了相应的分类方法。实验结果表明,该方法在分类性能上优于BP神经网络,与支持向量机相当,但在学习和分类速度上均远超BP神经网络或支持向量机。
     4.提出了一种基于分组结构的正则化回归模型的文本分类方法。目前,基于lasso约束的回归模型可以较好地解决降维和分类不一致的问题。但文本特征的相关性常会导致这类模型过度稀疏(丢失较多的判别特征)。本文通过聚类方法获得相关特征的分组结构,并将该结构以正则化方式嵌入logistic回归模型,通过在组间及组内同时稀疏化实现在模型中保留重要的组并消除组内噪声的目的,最后在对应的模型上实现分类。实验结果表明,该方法在模型稀疏度和性能之间获得了很好的平衡。
Text is one of the most fundamental and important carrier of information. With the development of information technology, text information bursts rapidly. Thus, it is a great challenge that how to organize and manage these massive information, and how to obtain the required information quickly, accurately, and comprehensively. Text categorization is a powerful approach to organize and manage text information, and it is also an important underpinning of information retrieval and data mining.
     In this thesis, the related researches of text categorization are introduced. And then, based on the regularized linear model and its recent developments, we focused on several aspects:dimensionality reduction, text representation, fast learning of classifier, and the model consistency between dimensionality reduction and classification. The major works of this dissertation are as follows:
     1. A dimensionality reduction approach with category information fusion and non-negative matrix factorization is proposed. Since the multi-label category information is difficult to be utilized by the traditional non-negative matrix factorization for di-mensionality reduction, this thesis presents a method to fuse category information of documents into the matrix factorization via a category coding and dimensional-ity extension, such that the discriminability of the basis vectors could be enforced. After that, a non-negative matrix factorization algorithm was developed, where the basis vectors were driven to orthogonality, which aims to reduce the redundant in-formation. With the truncation and transformation of matrices, the dimensionality reduction was implemented, which can map documents from the high dimensional term space into a low dimensional semantic subspace spanned by the non-negative basis vectors. Experimental results show that the proposed approach remains good classification performance even in a very low dimensional situation.
     2. A novel method called non-negative sparse semantic coding for text categorization is presented, which is used to tackle the dense representation problem that is not con-sistent with our common knowledge, and to tackle the issues that the popular sparse coding methods are time-consuming and their dictionaries might contain negative entries. This paper developed an efficient algorithm to construct a non-negative dic-tionary that contains much discriminative semantic concepts and little redundancy as possible. After that, in a low dimensional semantic subspace spanned by the dictio-nary, all documents can be represented with a non-negative sparse form to keep con-sistent with our common knowledge. Experimental results show that the proposed approach achieves good performance and provides more interpretability.
     3. A text categorization method based on the extreme learning machine (ELM) is p-resented. ELM is a fast-developing technology of machine learning in recent years. Generally, its model can be obtained analytically, which avoids the convergence dif-ficulties existing in traditional methods, so it has a very fast learning speed. To deal with the problems when ELM is applied to the high dimensional and sparse text da-ta, this thesis constructed a regularized extreme learning machine (RELM), whose analytical solution and theoretical proof that ensures the existence of solution are also given. Finally, a classification method was presented according to the structure feature of the model. Experimental results show that the proposed method can obtain competitive performance in most cases and can learn faster than the conventional popular learning algorithms such as the back propagation neural networks or support vector machine.
     4. A grouped structure-based regularized regression model is proposed. Generally, a regression model with the lasso constraint could keep consistent between dimen-sionality reduction and classification. However, the correlation among text features might lead to an excessively sparse solution for the model (i.e., some important dis-criminating features might be discarded). In this thesis, a grouped structure was con-structed with a clustering algorithm according to the correlation of text features. The grouped structure then was embedded into a logistic regression model via a between-and within-group sparse manner. Thus, the groups containing many important fea-tures can be selected even the features in these groups are highly correlated, and the noise within the selected groups could be discarded simultaneously during the model fitting. After that, the classification was implemented in this model. The ex-perimental results show that the proposed method achieves a good tradeoff between the performance and sparsity in most scenarios.

引文

[1]李东贤.生存之路-计算机技术引发的全新经营革命[M].北京：清华大学出版社,1997.
    [2]SEBASTIANI F. Machine learning in automated text categorization [J]. ACM Com-puting Surveys,2002,34(1):1-47.
    [3]CNNIC.第28次中国互联网络发展状况统计报告[R].中国互联网络信息中心,2011.
    [4]苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9)：1848-1859.
    [5]BILSKI A. A review of artificial intelligence algorithms in document classi-fication[J]. International Journal of Electronics and Telecommunications,2011, 57(3):263-270.
    [6]BISHOP C M. Pattern recognition and machine learning[M]. New York:Springer-Verlag New York Inc.,2006.
    [7]JOACHIMS T. Text categorization with support vector machines:learning with many relevant features[C].10th European Conference on Machine Learning,1998: 137-142.
    [8]ZHANG T, OLES F. Text categorization based on regularized linear classification methods [J]. Information Retrieval,2001,4(1):5-31.
    [9]ZHANG J, YANG Y. Robustness of regularized linear classification methods in text categorization[C]. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,2003:190-197.
    [10]LEWIS D D. Representation and learning in information retrieval[D]. PhD thesis, Univ. of Massachusetts,1992.
    [11]PORTER M. Stemming algorithm[OL]. http://tartarus.org/martin/PorterStemmer.
    [12]FURNKRANZ J. A study using n-gram features for text categorization[R]. Austrian Research Institute for Artifical Intelligence,1998.
    [13]MOSCHITTI A, BASILI R. Complex linguistic features for text classification:A comprehensive study[C]. European Conference on Information Retrieval,2004: 181-196.
    [14]ZHANG W, YOSHIDA T, TANG X. Text classification based on multi-word with support vector machine[J]. Knowledge-Based Systems,2008,21(8):879-886.
    [15]ZHANG W, YOSHIDA T, TANG X J. A comparative study of tf*idf, lsi and multi-words for text classification[J]. Expert Systems with Applications,2011, 38(3):2758-2765.
    [16]CAVNAR W, TRENKLE J. N-gram-based text categorization[C].3rd Annual Sym-posium on Document Analysis and Information Retrieval,1994:161-175.
    [17]SALTON G, BUCKLEY C. Term-weighting approaches in automatic text re-trieval[J]. Information Processing & Management,1988,24(5):513-523.
    [18]HOFMANN T. Probabilistic latent semantic indexing[C]. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in in-formation retrieval,1999:50-57.
    [19]BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research,2003,3(4):993-1022.
    [20]BEKKERMAN R, ALLAN J. Using bigrams in text categorization[R]. Department of Computer Science, University of Massachusetts, Amherst,2004.
    [21]JO T, LEE M. Kernel based learning suitable for text categorization[C].5th ACIS International Conference on Software Engineering Research,2007:289-292.
    [22]薛德军.中文文本自动分类中的关键问题研究[D].博士学位论文,清华大学,2004.
    [23]ZHOU S, LI K, LIU Y. Text categorization based on topic model[C]. International conference on Rough sets and knowledge technology,2008:572-579.
    [24]李文波,孙乐,张大鲲.基于labeled-lda模型的文本分类新算法[J].计算机学报,2008,31(4)：620-6.
    [25]GUO Y, SHAO Z, HUA N. Automatic text categorization based on content analysis with cognitive situation models[J]. Information Sciences,2010,180(5):613-630.
    [26]DEBOLE F, SEBASTIANI F. Supervised term weighting for automated text cate-gorization[C]. Proceedings of the 2003 ACM symposium on Applied computing, 2003:784-788.
    [27]WONG A, LEE J. An evolutionary approach for discovering effective composite features for text categorization[C]. IEEE International Conference on Systems, Man and Cybernetics,2007:3045-3050.
    [28]XUE X, ZHOU Z. Distributional features for text categorization[C]. European Conference on Machine Learning,2006:497-508.
    [29]XUE X, ZHOU Z. Distributional features for text categorization[J]. IEEE Transac-tions on Knowledge and Data Engineering,2009,21(3):428-442.
    [30]LAN M, TAN C, SU J, et al. Supervised and traditional term weighting methods for automatic text categorization [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(4):721-735.
    [31]刘赫.文本分类中若干问题研究[D].博士学位论文,吉林大学,2009.
    [32]YANG Y, PEDERSEN J. A comparative study on feature selection in text catego-rization[C]. International Conference on Machine Learning,1997:412-420.
    [33]LIU H, MOTODA H. Feature extraction, construction and selection:A data mining perspective[M]. Kluwer Academic Pub,1998.
    [34]LANDAUER T, FOLTZ P, LAHAM D. An introduction to latent semantic analy-sis[J]. Discourse processes,1998,25(2):259-284.
    [35]FISHER R, OTHERS. The use of multiple measurements in taxonomic problems[J]. Annals of Eugenics,1936,7(2):179-188.
    [36]BEKKERMAN R, EL-YANIV R, TISHBY N, et al. Distributional word clusters vs. words for text categorization[J]. Journal of Machine Learning Research,2003, 3(3):1183-1208.
    [37]DHILLON I, MALLELA S, KUMAR R. A divisive information theoretic fea-ture clustering algorithm for text classification[J]. Journal of Machine Learning Research,2003,3(3):1265-1287.
    [38]JIANG J Y, LIOU R J, LEE S J. A fuzzy self-constructing feature clustering algorith-m for text classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2011,23(3):335-349.
    [39]JOLLIFFE I, MYILIBRARY. Principal component analysis[M]. Wiley Online Li-brary,2002.
    [40]STONE J. Independent component analysis:an introduction[J]. Trends in cognitive sciences,2002,6(2):59-64.
    [41]LEE D D, SEUNG H S. Learning the parts of objects by non-negative matrix fac-torization[J]. NATURE,1999,401(6755):788-791.
    [42]ZHENG Z, SRIHARI R. Optimally combining positive and negative features for text categorization[C]. International Conference on Machine Learning,2003.
    [43]PENG H, LONG F, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226-1238.
    [44]MAKREHCHI M, KAMEL M S. Text classification using small number of fea-tures [C]. Machine Learning and Data Mining in Pattern Recognition,2005:580-589.
    [45]SHANG W, HUANG H, ZHU H, et al. A novel feature selection algorithm for text categorization[J]. Expert Systems with Applications,2007,33(1):1-5.
    [46]尚文倩,黄厚宽,刘玉玲,等.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10)：1688-1694.
    [47]徐燕,李锦涛,王斌,等.文本分类中特征选择的约束研究[J].计算机研究与发展,2008,45(4)：596-602.
    [48]谭松波.高性能文本分类算法研究[D].博士学位论文,中国科学院计算机技术研究所,2006.
    [49]HINTON G, SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science,2006,313(5786):504-507.
    [50]CHEN L, TOKUDA N, NAGAI A. A new differential lsi space-based probabilistic document classifier[J]. Information Processing Letters,2003,88(5):203-212.
    [51]SILVA C, RIBEIRO B. Knowledge extraction with non-negative matrix factoriza-tion for text classification[C]. International conference on Intelligent data engineer-ing and automated learning,2009:300-308.
    [52]ROGATI M, YANG Y. High-performing feature selection for text classification[C]. International conference on Information and knowledge management,2002:659-661.
    [53]MLADENIC D, BRANK J, GROBELNIK M, et al. Feature selection using lin-ear classifier weights:interaction with classification models[C]. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval,2004:234-241.
    [54]CHUA S, KULATHURAMAIYER N. Semantic feature selection using wordnet[C]. Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelli-gence,2004:166-172.
    [55]NG H, GOH W, LOW K. Feature selection, perceptron learning, and a usability case study for text categorization[C].20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1997:67-73.
    [56]RUIZ M, SRINIVASAN P. Hierarchical neural networks for text categorization (poster abstract) [C]. Proceedings of the 22nd annual international ACM SIGIR con-ference on Research and development in information retrieval,1999:281-282.
    [57]YANG Y, CHUTE C. An example-based mapping method for text categorization and retrieval[J]. ACM Transactions on Information Systems,1994,12(3):252-277.
    [58]LANGLEY P, IBA W, THOMPSON K. An analysis of bayesian classifiers[C]. Proceedings of the National Conference on Artificial Intelligence,1992:223-223.
    [59]MCCALLUM A, NIGAM K. A comparison of event models for naive bayes text classification[C]. AAAI-98 workshop on learning for text categorization,1998:41-48.
    [60]YANG Y, LIU X. A re-examination of text categorization methods[C]. Proceedings of the 22nd annual international ACM SIGIR conference on Research and develop-ment in information retrieval,1999:42-49.
    [61]GODBOLE S, SARAWAGI S, CHAKRABARTI S. Scaling multi-class support vector machines using inter-class confusion[C]. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2002:513-518.
    [62]LEWIS D D, YANG Y M, ROSE T G, et al. Rcvl:A new benchmark collection for text categorization research[J]. Journal of Machine Learning Research,2004, 5:361-397.
    [63]GABRILOVICH E, MARKOVITCH S. Text categorization with many redundant features:Using aggressive feature selection to make svms competitive with c4.5[C]. International Conference on Machine Learning,2004:321-328.
    [64]HMEIDI I, HAWASHIN B, EL-QAWASMEH E. Performance of knn and svm classifiers on full word arabic articles[J]. Advanced Engineering Informatics,2008, 22(1):106-111.
    [65]HANSEN L, SALAMON P. Neural network ensembles[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,1990,12(10):993-1001.
    [66]FREUND Y. An adaptive version of the boost by majority algorithm[J]. Machine learning,2001,43(3):293-318.
    [67]LARKEY L, CROFT W. Combining classifiers in text categorization[C]. Proceed-ings of the 19th annual international ACM SIGIR conference on Research and de-velopment in information retrieval,1996:289-297.
    [68]WOODS K, KEGELMEYER JR W, BOWYER K. Combination of multiple classi-fiers using local accuracy estimates[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(4):405-410.
    [69]LI Y, JAIN A. Classification of text documents[J]. The Computer Journal,1998, 41(8):537-546.
    [70]UEJIMA H, MIURA T, SHIOYA I. Improving text categorization by resolving semantic ambiguity[J]. Systems and Computers in Japan,2005,36(4):1-8.
    [71]GANIZ M C, GEORGE C, POTTENGER W M. Higher order naive bayes:A novel non-iid approach to text classification[J]. IEEE Transactions on Knowledge and Data Engineering,2011,23(7):1022-1034.
    [72]章舜仲.文本分类中词共现关系的研究及其应用[D].博士学位论文,南京理工大学,2009.
    [73]KAZAMA J, TSUJII J. Maximum entropy models with inequality constraints:A case study on text categorization [J]. Machine Learning,2005,60(1):159-194.
    [74]YAMADA T, YAMASHITA K, ISHII N, et al. Text classification by combin-ing different distance functions withweights[C]. Seventh ACIS International Con-ference on Software Engineering, Artificial Intelligence, Networking, and Paral-lel/Distributed Computing,2006:85-90.
    [75]郝秀兰,陶晓鹏,徐和祥,等.knn文本分类器类偏斜问题的一种处理对策[J].计算机研究与发展,2009,46(1)：52-61.
    [76]JIANG S Y, PANG G S, WU M L, et al. An improved k-nearest-neighbor algorithm for text categorization[J]. Expert Systems with Applications,2012,39(1):1503-1509.
    [77]CHAPELLE O, SINDHWANI V, KEERTHI S. Optimization techniques for semi-supervised support vector machines[J]. Journal of Machine Learning Research, 2008,9(2):203-233.
    [78]CHANG Y C, CHEN S M, LIAU C J. Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method[J]. Expert Systems with Applications,2008,34(3):1948-1953.
    [79]宋枫溪.自动文本分类若干基本问题研究[D].博士学位论文,南京理工大学,2004.
    [80]BENNETT P, DUMAIS S, HORVITZ E. The combination of text classifiers using reliability indicators[J]. Information Retrieval,2005,8(1):67-100.
    [81]姜远,周志华.基于词频分类器集成的文本分类方法[J].计算机研究与发展,2006,43(10)：1681-1687.
    [82]XU X, ZHANG B, ZHONG Q. Text categorization using svms with rocchio en-semble for internet information classification[C]. international conference on on Networking and Mobile Computing,2005:1022-1031.
    [83]BELL D, GUAN J, BI Y. On combining classifier mass functions for text cat-egorization[J]. IEEE Transactions on Knowledge and Data Engineering,2005, 17(10):1307-1319.
    [84]SHI L, MA X, XI L, et al. Rough set and ensemble learning based semi-supervised algorithm for text classification[J]. Expert Systems with Applications, 2011,38(5):6300-6306.
    [85]LI W, MIAO D, WANG W. Two-level hierarchical combination method for text classification[J]. Expert Systems with Applications,2011,38(3):2030-2039.
    [86]TSIMBOUKAKIS N, TAMBOURATZIS G. Word-map systems for content-based document classification[J]. IEEE Transactions on Systems Man and Cybernetics Part C-Applications and Reviews,2011,41(5):662-673.
    [87]LEWIS D, GALE W. A sequential algorithm for training text classifiers[C]. Pro-ceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval,1994:3-12.
    [88]ITTNER D, LEWIS D, AHN D. Text categorization of low quality images[C]. Symposium on Document Analysis and Information Retrieval,1995:301-315.
    [89]SCHUTZE H, HULL D, PEDERSEN J. A comparison of classifiers and docu-ment representations for the routing problem[C]. Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval,1995:229-237.
    [90]GENKIN A, LEWIS D D, MADIGAN D. Large-scale bayesian logistic regression for text categorization [J]. Technometrics,2007,49:291-304.
    [91]ASEERVATHAM S, ANTONIADIS A, GAUSSIER E, et al. A sparse version of the ridge logistic regression for large-scale text categorization [J]. Pattern Recognition Letters,2011,32(2):101-106.
    [92]LEWIS D D. Reuters[OL]. http://www.daviddlewis.com/resources/testcollections/.
    [93]LANG K. Newsweeder:Learning to filter netnews[C]. In Proceedings of the Twelfth International Conference on Machine Learning,1995:331-339.
    [94]CRAVEN M. Learning to extract symbolic knowledge from the world wide web[C]. Proceedings of the National Conference on Artificial Intelligence,1998:509-516.
    [95]RENNIE J.20-newsgroups[OL]. http://people.csail.mit.edu/jrennie/20Newsgroups/.
    [96]CRAVEN M. Webkb[OL]. http://web.ist.utl.pt/acardoso/datasets/.
    [97]HERSH W, BUCKLEY C, LEONE T, et al. Ohsumed:An interactive retrieval eval-uation and new large test collection for research[C]. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval,1994:192-201.
    [98]OPEN DIRECTORY PROJECT T. Dmoz[OL]. http://www.dmoz.org/.
    [99]中国科学院.Tc863[OL]. http://www.nlp.org.cn/.
    [100]SOGOU. Sogout[OL]. http://www.sogou.com/labs/dl/t.html.
    [101]TAN S, CHENG X, GHANEM M, et al. A novel refinement approach for text categorization [C]. Proceedings of the 14th ACM international conference on Infor-mation and knowledge management,2005:465-476.
    [102]TIKHONOV A. Solution of incorrectly formulated problems and the regularization method[J]. Soviet Math,Dokl,1963,5:1035-1038.
    [103]HADAMARD J. Sur les problemes aux derivees partielles et leur signification physique[J]. Princeton University Bulletin,1902,13:49-52.
    [104]PHILLIPS D. A technique for the numerical solution of certain integral equations of the first kind[J]. Journal of the ACM,1962,9(1):84-97.
    [105]TIKHONOV A, ARSENIN V, JOHN F. Solutions of ill-posed problems[M]. Win-ston Washington, DC:,1977.
    [106]POGGIO T, GIROSI F. Regularization algorithms for learning that are equivalent to multilayer networks [J]. Science,1990,247(4945):978.
    [107]CHEN Z, HAYKIN S. On different facets of regularization theory[J]. Neural Com-putation,2002,14(12):2791-2846.
    [108]VAPNIK V. Statistical learning theory.1998[M]. Wiley, New York,1998.
    [109]RIFKIN R. Everything old is new again:a fresh look at historical approaches in machine learning[D]. PhD thesis, Massachusetts Institute of Technology,2002.
    [110]BELKIN M, NIYOGI P, SINDHWANI V. Manifold regularization:A geometric framework for learning from examples. [R]. Department of Computer Science, Uni-versity of Chicago,2004.
    [111]HOERL A, KENNARD R. Ridge regression:applications to nonorthogonal prob-lems[J]. Technometrics,1970,12(l):69-82.
    [112]TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society. Series B (Methodological),1996,58(1):267-288.
    [113]HUANG J, ZHANG T, METAXAS D. Learning with structured sparsity[C]. Inter-national Conference on Machine Learning,2009:417-424.
    [114]HUANG J, ZHANG T, METAXAS D. Learning with structured sparsity[J]. Journal of Machine Learning Research,2011,12(11):3371-3412.
    [115]JENATTON R, AUDIBERT J Y, BACH F. Structured variable selection with sparsity-inducing norms [J]. Journal of Machine Learning Research,2011, 12(10):2777-2824.
    [116]TURLACH B, VENABLES W, WRIGHT S. Simultaneous variable selection[J]. Technometrics,2005,47(3):349-363.
    [117]YUAN M, LIN Y. Model selection and estimation in regression with grouped vari-ables[J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2006,68(1):49-67.
    [118]FRIEDMAN J, HASTIE T, TIBSHIRANI R. A note on the group lasso and a sparse group lasso[R]. Department of Statistics,Stanford University,2010.
    [119]ZHAO P, ROCHA G, YU B. The composite absolute penalties family for grouped and hierarchical variable selection[J]. The Annals of Statistics,2009,37(6A):3468-3497.
    [120]KAVUKCUOGLU K, RANZATO M, FERGUS R, et al. Learning invariant fea-tures through topographic filter maps[C]. IEEE Conference on Computer Vision and Pattern Recognition,2009:1605-1612.
    [121]RAO N S, NOWAK R D, WRIGHT S J, et al. Convex approaches to model wavelet sparsity patterns[C]. International Conference on Image Processing,2011:1917-1920.
    [122]LIU J, YE J. Moreau-yosida regularization for grouped tree structure learning[C]. Advances in Neural Information Processing Systems,2010:1459-1467.
    [123]JENATTON R, MAIRAL J, OBOZINSKI G, et al. Proximal methods for sparse hierarchical dictionary learning[C]. International Conference on Machine Learning, 2010:487-494.
    [124]JENATTON R, MAIRAL J, OBOZINSKI G, et al. Proximal methods for hierar-chical sparse coding[J]. Journal of Machine Learning Research,2011,12(7):2297-2334.
    [125]KIM S, XING E P. Tree-guided group lasso for multi-task regression with structured sparsity[C]. International Conference on Machine Learning,2010:543-550.
    [126]JENATTON R, GRAMFORT A, MICHEL V, et al. Multi-scale mining of ftnri data with hierarchical structured sparsity[C].2011 International Workshop on Pattern Recognition in NeuroImaging,2011:69-72.
    [127]SALTON G, WONG A, YANG C. A vector space model for automatic indexing[J]. Communications of the Acm,1975,18(11):613-620.
    [128]ZHENG W, QIAN Y. Aggressive dimensionality reduction with reinforcement local feature selection for text categorization[C]. International Conference on Artificial Intelligence and Computational Intelligence,2010:365-372.
    [129]YU B, XU Z, LI C. Latent semantic analysis for text categorization using neural network[J]. Knowledge-Based Systems,2008,21(8):900-904.
    [130]LI S, HOU X, ZHANG H, et al. Learning spatially localized, parts-based repre-sentation[C]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2001:207-212.
    [131]HOYER P. Non-negative matrix factorization with sparseness constraints [J]. Journal of Machine Learning Research,2004,5(11):1457-1469.
    [132]WANG Y, JIA Y. Fisher non-negative matrix factorization for learning local fea-tures[C], Asian Conference on Computer Vision,2004:27-30.
    [133]XUE Y, TONG C, CHEN W, et al. A modified non-negative matrix factorization algorithm for face recognition[C]. International Conference on Pattern Recognition, 2006:495-498.
    [134]KOTSIA I, ZAFEIRIOU S, PITAS I. A novel discriminant non-negative matrix fac-torization algorithm with applications to facial image characterization problems [J]. IEEE Transactions on Information Forensics and Security,2007,2(3):588-595.
    [135]DING C, LI T, PENG W, et al. Orthogonal nonnegative matrix t-factorizations for clustering[C]. International Conference on Knowledge Discovery and Data Mining, 2006:126-135.
    [136]SEUNG D, LEE L. Algorithms for non-negative matrix factorization[C]. Advances in Neural Information Processing Systems,2001:556-562.
    [137]刘维湘,郑南宁,游屈波.非负矩阵分解及其在模式识别中的应用[J].科学通报,2006,51(3)：241-250.
    [138]SAJDA P, DU S, PARRA L, et al. Recovery of constituent spectra using non-negative matrix factorization[C]. The International Society for Optical Engineering, 2003:321-331.
    [139]DONOHO D, STODDEN V. When does non-negative matrix factorization give a correct decomposition into parts[C]. Advances in Neural Information Processing Systems,2003:1141-1148.
    [140]ZHENG W, QIAN Y, LU H. Text categorization based on regularization extreme learning machine[J]. Neural Computing & Applications,2012, inpressing.
    [141]JOACHIMS T. Svmlight[OL]. http://svmlight.joachims.org/.
    [142]CAI D, BAO H, HE X. Sparse concept coding for visual analysis[C]. IEEE Confer-ence on Computer Vision and Pattern Recognition,2011:2905-2910.
    [143]HUANG K, AVIYENTE S. Sparse representation for signal classification[J]. Ad-vances in Neural Information Processing Systems,2007:609-617.
    [144]MAIRAL J, BACH F, PONCE J, et al. Online learning for matrix factorization and sparse coding[J]. Journal of Machine Learning Research,2010,11(1):19-60.
    [145]YANG M, ZHANG L, FENG X, et al. Fisher discrimination dictionary learning for sparse representation[C]. International Conference on Computer Vision,2011: 543-550.
    [146]WRIGHT J, YANG A, GANESH A, et al. Robust face recognition via sparse rep-resentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009,31(2):210-227.
    [147]DONOHO D, HUO X. Uncertainty principles and ideal atomic decomposition [J]. IEEE Transactions on Information Theory,2001,47(7):2845-2862.
    [148]LIU J, YE J. Efficient euclidean projections in linear time[C]. International Confer-ence on Machine Learning,2009:657-664.
    [149]BACH F, JENATTON R, MAIRAL J, et al. Convex optimization with sparsity-inducing norms[G]//IN S. SRA S J W, S. NOWOZIN. Optimization for Machine Learning. MIT Press,2011:19-54.
    [150]SHALEV-SHWARTZ S, TEWARI A. Stochastic methods for l1-regularized loss minim izati on [J]. Journal of Machine Learning Research,2011,12(6):1865-1892.
    [151]LEWIS D, RINGUETTE M. A comparison of two learning algorithms for text categorization[C]. Third annual symposium on document analysis and information retrieval,1994:81-93.
    [152]ANAND R, MEHROTRA K, MOHAN C, et al. Efficient classification for mul-ticlass problems using modular neural networks[J]. IEEE Transactions on Neural Networks,1995,6(1):117-124.
    [153]MAN Z, WU H, LIU S, et al. A new adaptive backpropagation algorithm based on lyapunov stability theory for neural networks [J]. IEEE Transactions on Neural Networks,2006,17(6):1580-1591.
    [154]HUANG G, ZHU Q, SIEW C. Extreme learning machine:a new learning scheme of feedforward neural networks[C]. Proceedings of the International Joint Conference on Neural Networks,2004:985-990.
    [155]HUANG G, ZHU Q, SIEW C. Extreme learning machine:theory and application-s[J]. Neurocomputing,2006,70(1-3):489-501.
    [156]HUANG G, WANG D, LAN Y. Extreme learning machines:a survey[J]. Interna-tional Journal of Machine Learning and Cybernetics,2011,2(2):107-122.
    [157]HUANG G B, ZHOU H, DING X, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Transactions on Systems, Man, and Cyber-netics, Part B:Cybernetics,2012,42(2):513-529.
    [158]ZHU Q, QIN A, SUGANTHAN P, et al. Evolutionary extreme learning machine[J]. Pattern recognition,2005,38(10):1759-1763.
    [159]DEERWESTER S, DUMAIS S, FURNAS G, et al. Indexing by latent semantic anal-ysis[J]. Journal of the American society for information science,1990,41(6):391-407.
    [160]WANG W, YU B. Text categorization based on combination of modified back prop-agation neural network and latent semantic analysis[J]. Neural computing & appli-cations,2009,18(8):875-881.
    [161]ZOU H, HASTIE T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society:Series B (Statistical Methodology),2005, 67(2):301-320.
    [162]QIAN Y, JIA S, ZHOU J, et al. Hyperspectral unmixing via 11/2 sparsity-constrained nonnegative matrix factorization[J]. IEEE Transactions on Geoscience and Remote Sensing,2011,49(11):4287-4297.
    [163]LIU Y, LOH H, TOR S. Comparison of extreme learning machine with support vector machine for text classification[C]. International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems,2005: 390-399.
    [164]GUYON I, ELISSEEFF A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research,2003,3(3):1157-1182.
    [165]ROTH V. The generalized lasso[J]. IEEE Transactions on Neural Networks,2004, 15(1):16-28.
    [166]ZHAO P, YU B. On model selection consistency of lasso [J]. Journal of Machine Learning Research,2006,7(2):2541-2563.
    [167]MACQUEEN J, OTHERS. Some methods for classification and analysis of mul-tivariate observations[C]. Proceedings of the fifth Berkeley symposium on mathe-matical statistics and probability,1967:281-297.
    [168]POTHEN A, SIMON H, LIOU K, et al. Partitioning sparse matrices with eigen-vectors of graphs.[J]. SIAM Journal on Matrix Analysis and Applications,1990, 11(3):430-452.
    [169]FREY B, DUECK D. Clustering by passing messages between data points[J]. Sci-ence,2007,315(5814):972-976.
    [170]BRUSCO M, KOHN H. Comment on "clustering by passing messages between data points"[J]. Science,2008,319(5864):726c.
    [171]FREY B, DUECK D. Response to comment on "clustering by passing messages between data points"[J]. Science,2008,319(5864):726d.
    [172]LEMARECHAL C, SAGASTIZABAL C. Practical aspects of the moreau-yosida regularization:Theoretical preliminaries[J]. SIAM Journal on Optimization,1997, 7(2):367-385.
    [173]HALE E, YIN W, ZHANG Y. Fixed-point continuation for l1-minimization: Methodology and convergence[J]. SIAM Journal on Optimization,2008, 19(3):1107-1130.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700