用户名: 密码: 验证码:
文本分类中词共现关系的研究及其应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在网络信息时代,文本分类作为大规模文本处理的技术基础,有着广泛的应用前景。随着研究的深入,文本分类技术日趋成熟,开始进入实用阶段,对于文本分类算法本身的研究越来越难以取得突破性创新。在这种情况下,寻找新的研究切入点,从基础性问题入手,解决关键技术,再逐步展开应用于分类器性能的改进,对于文本分类技术研究的发展具有重要的意义。
     本文从分析文本类别特征入手,提出了文本分类中词语共现关系研究的课题,从关联和相关两个方面研究了词语共现关系,然后提出了多种将词关联和词相关应用于文本分类的方法,可概括为直接的基于关联或相关的文本分类模型构建以及间接的其它文本分类模型的改进这样两种应用方式。
     关联和相关的概念源于数据挖掘中的规则有趣性度量,本文将其应用于文本分类,又采用了统计学中对于相关的一般性解释,从线性和非线性两个方面分析了词间相关性,其中线性相关分析包括了线性方程参数求解和线性相关系数的计算,而非线性相关则主要分析了基于概率的相关性度量指标计算。
     对于关联分析在文本分类中的应用,本文从关联文本分类模型和关联特征选择这两个方面进行研究。关联文本分类模型是一种基于规则的文本分类模型,在分类规则挖掘算法上,我们提出了一种应用于长频繁集挖掘的基于变动邻域搜索的遗传算法(VNS-GA)设计,在文档类别判定方法上,我们提出了基于规则匹配长度计算的文档区分算法。在关联特征选择的研究中,我们总结了文档类别区分能力和文档覆盖率这两个特征选择原则,提出了k项频集的并集的选择方法。在Yahoo中文文本数据集上的实验结果表明本文提出的长频繁集挖掘算法能够有效地应用于关联文本分类模型,而运用关联特征选择的朴素贝叶斯文本分类模型也大大提高了分类性能。
     对于相关分析在文本分类中的应用,本文所作研究包括线性最小二乘拟合(LLSF)分类、LLSF和朴素贝叶斯组合分类和基于概率相关性分析的改进贝叶斯分类。在Reuters-21578文本集上的实验结果可得以下结论:首先,LLSF分类器的效果不够理想,说明在文本分类中词间线性关系表现较弱,完全基于线性关系假设的分类器可能存在较大偏差;其次,LLSF和朴素贝叶斯组合分类的性能要好于两种分类器单独使用的性能,说明尽管LLSF分类效果较差,但作为一种成熟分类算法仍有其价值;最后,基于概率相关性分析的改进贝叶斯比朴素贝叶斯分类在评测指标上有着全面的明显的提高,说明了本文提出的词集相关度计算方法用于改进贝叶斯分类的有效性。
In the age of network information, text categorization, as a fundamental technology of massive text processing, has wide application prospect. With the research progression, text categorization technology is becoming more and more mature. As text categorization has entered into application stage, it's more difficult to get breakthrough innovation of text categorization algorithms. Under such situations, it's of great significance for the developement of text categorization technology research to find new research entry points, start with fundamentals, solve key technology problems, and apply them into improving performance of classifiers.
     By proceeding with analysis on document type features, we brought forward the subject of research on term coexistence relation, and analyzed it in two aspects:association and correlation. Several methods of applying terms association and terms correlation in text classification were presented, which can be generalized in two categories such as direct use of building text classifier based on association or correlation analysis, indirect use of improving text classifier based on other techonologies.
     The concepts of association and correlation were derived from data mining as evaluation of rule interestingness. In this paper, we applied them into text classification, and adopted the general definition of correlation in statistics, analyzing term correlation in the methods of linear and nonlinear analysis. Thereinto, analysis on linear correlation includes linear equation parameters solution and linear correlation coefficient calculation, and with regard to nonlinear coreelation analysis, measurement indexes based on probablistic theory were calculated.
     As for the applications of association analysis in text categorization, we performed research work in associative text classification and assocative text feature selection. Associative text classification model is based on rules. As to classification rules mining, we proposed a method of genetic algorithm design for mining long frequent itemsets based on variable neighborhood search (VNS-GA), and when it comes to document type classification, we proposed a discriminating algorithm based on rules match length calculation. During the research of text associative feature selection, we summarized two principles of feature selection such as ability of document type identification and document coverage rate, and presented the selection method of combination set of k-frequent sets. Experiments on Yahoo Chinese text data set showed that the long frequent itemset mining algorithm proposed in this paper could be applied effectively into associative text classification, and the proposed method of associative feature selection could improve the classification performance of naive bayes (NB) text classifier.
     In this paper, what we studied about the applications of correlation analysis in text categorization included text classifier based on linear least square fit (LLSF), ensemble text classifier combining LLSF and NB classifiers, and improved bayesian classifiers based on probabilistic correlation analysis. Several conclusions from results of experiments on Reuters-21578 corpus may be safely drawn. First, LLSF classifier didn't get good results, that could be explained as weak linear relation among terms, which caused comparetively much deviation for the classifier based on linear relation assumption. Second, that LLSF+NB ensemble text classifier got better performance than any single of them showed that, as a mature algorithm, the value of LLSF classifier still existed despite its not-so-good classifying results. And last, improved bayesian classifier based on probabilistic correlation analysis was proved notably better performance overall than NB classifier by experiments, which verified the effectiveness of the method of term set correlative degree calculation applied into improve NB classification.
引文
[1]Aas K,Eikvil L.Text categorization:A survey. Technical Report, NR 941, Oslo: Norwegian Computing Center,1999.
    [2]Tom Mitchell.Machine learning.McGraw Hill,New York.1996
    [3]Luhn H P. Auto-encoding of documents for information retrieval systems. In:Boaz, M., Modern Trends in Documentation. London, England:Pergamon Press.1959:45-58.
    [4]Maron,M.E.,Kuhns,J.L. On relevance, probabilistic indexing, and information retrieval. Journal of the Association for Computing Machinery,1960,7(3):216-244.
    [5]Harold Borko. The construction of an empirically based mathematically derived classification system. In Proceedings of the spring joint computer conference,21 (1962),279-289.
    [6]Karen Sparck Jones. Automatic term classification and information retrieval. Proceedings of IFIP Congress,1968,2:1290-1295.
    [7]Gerard Salton, Michael Lesk. Computer Evaluation of Indexing and Text Processing. Journal of the ACM (JACM),1968,15(1):8-36.
    [8]Karen Sparck Jones, David M Jackson. The use of automatically-obtained keyword classifications for information retrieval. Information Storage and Retrieval,1970,5(4): 175-201.
    [9]Roger Michael Needham. Automatic Classification in Linguistics.The Statistician, 1967,17(1):45-54.
    [10]Fabrizio Sebastiani. A tutorial on automated text categorisation. In Proceedings of ASAI-99,1st Argentinian Symposium on Artificial Intelligence,1999:7-35.
    [11]Ying Yang,Christopher G. Chute. A Linear Least Squares Fit mapping method for information retrieval from natural language texts. International Conference On Computational Linguistics:Proceedings of the 14th conference on Computational linguistics. In Proceedings of the 14th International Conference Computational Linguistics(COLING 92), McGraw-Hill, New York,1992:447-453.
    [12]Yang Yi-Ming. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval,1999,1(1-2):69-90.
    [13]Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers. Proceedings of the Tenth National Conference on Artificial Intelligence. Menlo Park, USA,1992: 223-228.
    [14]Quinlan J R. Induction of decision trees. Machine Learning,1986,62(1):81-106.
    [15]Utgoff P E.ID5:An Incremental ID3. Proceedings of the Fifth International Conference on Machine Learning.1988:107-120.
    [16]Erik Wiener, Jan 0 Pedersen, Andreas S Weigend. A Neural Network Approach to Topic Spotting. Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), Las Vegas,NV,1995:317-332.
    [17]Thorsten Joachims. Text categorization with support vector machines:Learning with many relevant features. Proceedings of ECML-98,10th European Conference on Machine Learning,1998,10(1):137-142.
    [18]Freund Y. Boosting a Weak Algorithm by Majority. Information and Computation, 1995,121(2):256-285.
    [19]Freund Y, Schapire R E. A Decision-Theoretic Generalization of Online Learning and an Application to Boosting. Journal of Computer and System Sciences,1997, 55(1):119-139.
    [20]Breiman L. Bagging Predictors. Machine learning,1996.24(2):123-140.
    [21]Gabrilovich Evgeniy, Markovitch, Shaul. Overcoming the brittleness bottleneck using Wikipedia:Enhancing text categorization with encyclopedic knowledge. Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06, 2006:1301-1306.
    [22]Zhu, Jingbo, Chen Wenliang. Improving text categorization using domain knowledge. Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005.2005,103-113.
    [23]Itert Lukasz, Duch Wlodzislaw, Pestian, John. Medical document categorization using a priori knowledge. Lecture Notes in Computer Science, v3696 LNCS,2005, 641-646.
    [24]Yu-Long Qiao, Jeng-Shyang Pan, Sheng-He Sun. Improved k-nearest neighbors classification algorithm. Proceedings of Circuits and Systems, IEEE Asia-Pacific Conference,2004,2:1101-1104.
    [25]Pan J S, Qiao Y L, Sun S H. A fast k-nearest neighbors classification algorithm. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,2004, E87-A(4):961-963.
    [26]Huang W J, Wen K W. Fast KNN classification algorithm based on partial distance search. Electron Letters,1998,34(21):2062-2063.
    [27]Li Rong-Lu, Hu Yun-Fa. Noise reduction to text categorization based on density for KNN. Proceedings of the 2nd International Conference on Machine Learning and Cybernetics,2003,5:3119-3124.
    [28]Shuigeng Zhou, Tok Wang Ling. Fast text classification:A training-corpus pruning based approach. Proceedings of the 8th International Conference on Database Systems for Advanced Application. Los Alamitos:IEEE Computer Society,2003:127-136.
    [29]Geoffrey I. Webb, Janice R. Boughton, Zhihai Wang. Not so naive Bayes:aggregating one-dependence estimators. Machine Learning,2005,58(1):5-24
    [30]Kaizhu Huang, Irwin King, Michael R Lyu. learning Maximum Likelihood Semi-Naive Bayesian Network Classifier. In Proceedings of 2nd IEEE International Conference on Systems, Man and Cybernetics, Hammamet, Tunisia, October 6-9, 2002, vol.3.
    [31]Fried N, GeigerD, Goldszmidt M. Bayesian network classifiers. Machine Learning, 1997,29 (2-3):131-163.
    [32]Ramoni M, Sebastiani P. Robust Bayes classifiers. Artificial Intelligence,2001,125 (122):209-226.
    [33]Cheng J, Greiner R. Comparing Bayesian network classifiers. In:Laskey KB, Prade H, eds. Proc. of the 15th Conf. on Uncertaintyin Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers,1999.101-108.
    [34]Yang Y,Liu X.A re-examination of text categorization methods.In:Proceedings of the 22th International ACM SIGIR Conference on Research and Development in Information Retrieval,Berkley,CA USA,August,1999.42-49.
    [35]Susan Dumais. Using SVMs for text categorization. IEEE Intelligent Systems, 1998,13(4):21-23.
    [36]Tong S, Koller D. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research,2002,2(1):45-66.
    [37]Joachims T. Transductive inference for text classification using support vector machines. In:Proceedings of the 16th International Conference on Machine Learning. San Francisco,199:200-209.
    [38]Lin Chun-Fu, Wang Sheng-De. Fuzzy support vector machines. IEEE Transactions on Neural Networks,2002,13(2):464-470.
    [39]Siolas G, Florence D. Support vector machines based on a semantic kernel for text categorization. In:Proceedings of the International Joint Conference on Neural Networks,2000:205-209
    [40]Cristianini N, Shawe-Taylor J, Lodhi H. Latent semantic Kernels. Journal of Intelligent Information Systems,2002,18(2-3):127-152.
    [41]D.E.Rumelhart,J.L.Mcclelland.Parallel Distributed Processing:Explorations in the Microstructure of Cogniton.Combridge Bradford Books,MIT Press,1986.
    [42]S.Chen,C.F.N.Cowan and P.M.Grant.Orthogonal least squares learning algorithm for radial basis function networks.IEEE Trans on Neural Networks,1991.(2):302-309.
    [43]S.Chen,C.F.N.Cowan and P.M.Grant. Orthogonal least squares algorithm for training multi-output radial basis function networks.IEEE Proc.Part F,1992.139(6):378-384.
    [44]S.lee,R.M.kil.A Gaussian Potential Function Network with Hierarchically Self-organizing Learning.Neural Networks,1991.(4):207-224.
    [45]J.Platt.A Resource-Allocating Network for Function Interpolation.Neural Computation,1991.3(2):213-225.
    [46]V.Kadirkamanathan,M.Niranjan.A function estimation approach to sequential learning with neural networks. Neural Computation,1993.(5):954-975.
    [47]Cheng Hua Li, Soon Cheol Park. High Performance Text Categorization System Based on a Novel Neural Network Algorithm. Proceedings of the Sixth IEEE International Conference on Computer and Information Technology,2006:21-21.
    [48]Bo Yu, Zong-ben Xu, Cheng-hua Li. Latent semantic analysis for text categorization using neural network. Knowledge-Based Systems,2008,21(8):900-904.
    [49]Quinlan J R. learning efficient classification procedures and their application to chess end games. Machine Learning:An Artificial Intelligence Approach,1983:463-482.
    [50]Schlimmer J C, Fisher D. A Case Study of Incremental Concept Induction. Proceedings of the Fourth National Conference on Artificial Intelligence. 1986:496-501.
    [51]Quinlan J R. C4.5:Programs for Machine Learning. San Francisco, CA:Morgan Kaufmann Publishers,1993.
    [52]M.Mehta,R.Agraal.SLIQ:A fast scalable classifier for data mining.Lecture Notes in computer Science Proceedings of the 5~(th) International Conference on Extending Database Tech.,1996:18-33.
    [53]J.C.Shafer,R.Agrawal.SPRINT:A scalable parallel classifier for data mining. Proceedings of the 22nd International Conference on Very Large Databases. 1996:544-555.
    [54]Yiming Yang,Jian Zhang,and Bryan Kisiel.A scalability analysis of classifiers in text categorization.Proceedings of the 26~(th) ACM International Conference on Research and Development in Information Retrieval(SIGIR-03).2003:96-103.
    [55]Jian Zhang,Yiming Yang. Robustness of regularized linear classification methods in text categorization.Proceedings of the 26th ACM International Conference on Research and Development in Information Retrieval(SIGIR-03).2003:190-197.
    [56]Fan Li,and Yiming Yang. A loss function analysis for classification methods in text categorization. Proceedings of the 20th International Conference on Machine Learning(ICML-03).2003:472-479.
    [57]Schapire R E,and Singer Y.BoosTexter:a boosting-based system for text categorization.Machine Leanring.2000,39(2-3):135-168.
    [58]Joachims T. Text categorzation with support vector machines:learning with many relevant features.Proceedings of 10th European Conference on Machine Learning (ECML-97).1997:137-142.
    [59]William Hersh, Chris Buckley, T J Leone, David Hickam. OHSUMED:An interactive retrieval evaluation and new large test collection for research. In Proceedings of the 17th Annual ACM SIGIR Conference,1994:192-201.
    [60]Daphne Koller, Mehran Sahami. Hierarchically classifyfing documents using very few words. In Proceedings of the 14th International Conference on Machine Learning(ICML97),1997:170-178.
    [61]Andrew McCallum, Ronald Rosenfeld, Tom M. Mitchell. Improving Text Classification by Shrinkage in a Hierarchy of Classes. Proceedings of the 15th International Conference on Machine Learning,1998:359-367.
    [62]Dunja Mladenic, Marko Grobelnik. Feature selection for classification based on text hierarchy. Working notes of Learning from Text and the Web, Conference on Automated Learning and Discovery(CONALD-98),1998.
    [63]Minoru Sasaki, Kenji Kita. Rule-based text categorization using hierarchical categories. IEEE International Conference on Systems, Man, and Cybernetics,1998,3:2827-2830.
    [64]Susan Dumais, Hao Chen. Hierarchical Classification of Web Content. Proceedings of the 23rd annual international ACM SIGIR Conference on Research and Development in Information Retrieval,2000:256-263.
    [65]Miguel E. Ruiz,Padmini Srinivasa. Hierarchical Text Categorization Using Neural Networks. Information Retrieval,2002,5(1):87-118.
    [66]Lijuan Cai,Thomas Hofmann.Hierarchical document categorization with support vector machines. Proceedings of the 13th ACM International Conference on Information and Knowledge Management(CIKM) 2004:78-87
    [67]Nicolo Cesa-Bianchi, Claudio Gentile, Luca Zaniboni. Hierarchical classification: combining Bayes with SVM. Proceedings of the 23rd International Cconference on Machine Learning,2006:177-184.
    [68]Nadal C, Legault R, Suen C Y. Complementary algorithms for the recognition of totally unconstrained handwritten numerals. Proceedings of 10th International Conference on Pattern Recognition,1990,1.434-449.
    [69]Hansen L K, Salamon P. Neural network ensembles.IEEE Transactions on Pattern Analysis and Machine Intelligence.1990,12:993-1001.
    [70]Jerome Friedman,Trevor Hastie, Robert Tibshirani. Additive logistic regression:a statistical view of boosting. Annals of Statistics,2000,28(2):337-407.
    [71]Leah S Larkey, Bruce W Croft. Combining classifiers in text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1996:289-297.
    [72]Yoav Freund, Robert E Schapire. Experiments with a new boosting algorithm.Proceedings of the 13th International Conference on Machine Leaming(ICML-96),1996:148-156.
    [73]Richard Maclin, David Opitz. An empirical evaluation of bagging and boosting. In Proceedings of the Fourteenth National Conference on Artificial Intelligence,1997:546-551.
    [74]Yang Yiming, Thomas Ault, Thomas Pierce. Combining multiple learning strategies for effective cross validation.Proceedings of the 17th International Conference on Machine Learning(ICML-00),2000:1167-1182.
    [75]Paul N Bennett, Susan T Dumais, Eric Horvitz.Probabilistic combination of text classifiers using reliability indicators models and results. Proceedings of 25th ACM International Conference on Research and Development in Information Retrieval(SIGIR-02),2002:207-214.
    [76]David A Bell,J W Guan,Yaxin Bi. On Combining Classifier Mass Functions for Text Categorization. IEEE Transactions on Knowledge and Data Engineering, 2005,17(10):1307-1319.
    [77]Yaxin Bi,David Bell,Hui Wang,Gongde Guo. Combining Multiple Classifiers Using Dempster's Rule of Combination for Text Categorization. Applied Artificial Intelligence,2007,21 (3):211-239.
    [78]Aas K, Eikvil L. Text categorisation:A survey. Technical Report #941,Norwegian Computing Center,1999.
    [79]Andrew K Mccallum. Multi-Label Text Classification with a Mixture Model Trained by EM. in Working Notes of the AAAI'99 Workshop on Text Learning, Orlando,FL, 1999.
    [80]MR Boutell, J Luo, X Shen, CM Brown. Learning multi-label scene classification. Pattern Recognition,2004,37(9):1757-1771.
    [81]Andr E Elisseeff, Jason Weston. A kernel method for multi-labelled classification.in Advances in Neurallnformation Processing Systems 14, MIT Press,2001:681-687.
    [82]Tao Li, Mitsunori Ogihara. Detecting emotion in music, in Proceedings of the 4~(th) International Symposium on Music Information Retrieval,2003:239-240.
    [83]Grigorios Tsoumakas, Ioannis Katakis. Multi Label Classification:An Overview. International Journal of Data Warehouse and Mining,2007,3(3):1-13.
    [84]Diplaris S, Tsoumakas G, Mitkas P, Vlahavas I. Protein Classification with Multiple Algorithms, in Proceedings of the 10th Panhellenic Conference on Informatics (PCI 2005),2005:448-456.
    [85]Goncalves T, Quaresma P. A Preliminary Approach to the Multilabel Classification Problem of Portuguese Juridical Documents, in Proceedings of the 11th Portuguese Conference on Artificial Intelligence,2003:435-444.
    [86]Boris Lauser, Andreas Hotho. Automatic multi-label subject indexing in a multilingual environment, in Proceedings of the 7th European Conference in Research and Advanced Technology for Digital Libraries,2003:140-151.
    [87]A Clare, Ross D King. Knowledge Discovery in Multi-Label Phenotype Data, in Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery,2001:42-53.
    [88]Godbole S, Sarawagi S. Discriminative Methods for Multi-labeled Classification, in Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining,2004:22-30.
    [89]Min-Ling Zhang, Zhi-Hua Zhou, A k-Nearest Neighbor Based Algorithm for Multi-label Classification.in Proceedings of IEEE International Conference on Granular Computing,2005,2:718-721.
    [90]Xiao Luo, Nur A Zincir-Heywood. Evaluation of Two Systems on Multi-class Multi-label Document Classification, in Proceedings of the 15th International Symposium on Methodologies for Intelligent Systems,2005:161-169.
    [91]Min-Ling Zhang, Zhi-Hua Zhou, Multi-Label Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351.
    [92]Jiawei Han, Micheline Kamber. Data mining concepts and techniques[M]. San Francisco, CA:High Education Press, Morgan Kaufman Publishers,2001.
    [93]Agrowal R, Imielinski T, Swamr A. Mining Association Rules Between Sets of Items in Large Databases[C].In Proceedings of ACM SIGMOD Conference on Management of Data(SIGMOD'93). New York:ACM Press,1993,207-216.
    [94]J Han, J Pei. Mining frequent patterns by pattern-growth:Methodology and implications[C]. ACM SIGKDD Explorations (Special Issue on Scalable Data Mining Algorithms).2000,2(2):14-20.
    [95]J Han, J Pei, Y Yin. Mining frequent patterns without candidate generation[C]. In Proc of 2000 ACM-SIGMOD Int'l Conf on Management of Data (SIGMOD'00). Dallas, TX, New York:ACM Press,2000,1-12.
    [96]Roberto, Jr Bayardo. Efficiently mining long patterns from databases[C]. In Proc of ACM SIGMOD Int'l Conf on Management of Data. New York:ACM Press, 1998,85-93.
    [97]Lin DI, Kedem ZM. Pincer-Search:A new algorithm for discovering the maximum frequent set[C]. In:Schek HJ, ed. Proceedings of the 6th European Conference on Extending Database Technology. Heidelberg:Springer-Verlag,1998.105-119.
    [98]Lu SF, Lu ZD. Fast mining maximum frequent itemsets[J]. Journal of Software, 2001,12(2):293-297.
    [99]王晓峰,王天然,赵越.一种自顶向下挖掘长频繁项的有效方法[J].计算机研究与发展,2004,41(1):148-155.
    [100]Holland J H.Adaption in natural and artificial systems[M].Ann Arbor.University of Michigan Press,1975.
    [101]周明,孙树栋.遗传算法原理及应用[M].北京:国防工业出版社,1999.
    [102]王小平,曹立明.遗传算法-理论应用与软件实现[M].西安:西安交通大学出版社,2002.
    [103]Wengdong Wang, Susan M. Bridges. Genetic Algorithm Optimization of Membership Functions for Mining Fuzzy Association Rules[C]. Presented at the International Joint Conference on Information System,Fuzzy Theory and Technology Conference. Atlantic City, NJ:March 2,2000,131-134.
    [104]Mohammed Khabzaoui,Clarisse Dhaenens,El-Ghazali Talbi. Parallel Genetic Algorithms for multi-objective rule mining[C]. The 6th Metaheuristics International Conference. Vienna, Austria:August 22-26,2005,571-576.
    [105]Ashish Ghosh,Bhabesh Nath.Multi-objective rule mining using genetic algorithms[J]. Information Sciences,2004,163(1-3):123-133.
    [106]Srinivas M, Patnaik L M. Adaptive probabilities of crossover and mutation in genetic algorithms [J]. IEEE Transactions on Systems, Man and Cybernetics,1994,24 (4):656-667.
    [107]周远晖,陆玉昌,石纯一.基于克服过早收敛的自适应并行遗传算法[J].清华大学学报(自然科学版),1998,38(3):93-95.
    [108]Liu Bing. Integrating classification and association rule mining[J]. KDD-98, 1998,(22):558-560.
    [109]蒋福坤,刘正春,柴惠文.多维随机变量的线性相关性[J].数理统计与管理,2008,27(1):96-99
    [110]Takahashi Susumu. A Study on Multi Relation Coefficient among Variables[J]. Proceedings of the School of Information Technology and Electronics of Tokai University,2004,4(1):67-72.
    [111]Bocchieri E, Mark B. Subspace distribution clustering hidden Markov model[J].IEEE Transactions on Speech and Audio Processing,2001,9(3):264-275.
    [112]钱铁云.关联文本分类关键技术研究[D].武汉:华中科技大学,2006.
    [113]Yiming Yang,Christopher G Chute. An example-based mapping method for text categorization and retrieval J]. ACM Transactions on Information Systems, 1994,3(12):252-277.
    [114]武建华,宋擒豹,沈均毅,谢建文.基于关联规则的特征选择算法[J].模式识别与人工智能,2009,22(2):256-262.
    [115]Thomas Hofmann. Probabilistic latent semantic indexing[C].22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley,CA,USA,1999:50-57.
    [116]Blei D M, Ng A Y, Jordan M I. Latent dirichlet Allocation[J]. Journal of Machine Learning Research,2003,3:993-1022.
    [117]田宝明,戴新宇,陈家骏.一种基于随机森林的多视角文本分类方法[J].中文信息学报,2009,23(4):48-54.
    [118]广凯,潘金贵.一种基于向量夹角的k近邻多标记文本分类算法[J].计算机科学,2008,35(4):205-206.
    [119]姜远,佘俏俏,黎铭,周志华.一种直推式多标记文档分类方法[J].计算机研究与发展,2008,45(11):1817-1823.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700