用户名: 密码: 验证码:
文本语义表示及多层分类关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
信息技术的飞速发展和互联网技术的快速变革使得人类社会进入了信息极大丰富和快速更新的时代,特别是近年来各种社交网络的出现,每天有海量文本信息不断在网络上产生和传播。人们所面对的问题不再是如何获取信息,而是如何从大量信息中迅速有效地提取出所需信息。文本分类作为一项具有较大使用价值的关键技术,可以在较大程度上解决信息杂乱的问题,方便用户准确地定位所需的信息和分流信息。随着分类技术在信息检索、舆情分析、信息过滤、新闻分类和数字图书馆等多个领域的广泛应用,文本分类关键技术的研究已经成为信息处理领域的一个前沿课题,有着广泛的应用前景和重要的研究意义。本文对文本语义表示及多层分类关键技术进行了系统的研究,所取得的主要研究成果为:
     1.提出了一种基于文本语义图的文本表示模型。为了解决词频统计文本表示方法中词语语义信息缺失的问题,本文在考虑文本中词语上下文语境和语义背景信息的基础上,提出了一种新的中文文本表示模型:文本语义图。利用维基百科作为知识背景计算文本中实意特征词语的语义关联,将具有较强语义关系的词语合并成词包作为图的节点,节点权值用词包所包含词语的数目及词频计算;不同词包中词语间的上下文关系作为图的有向边,有向边权值用其邻接节点的最大权值表示,该模型较大程度地保留文本中词语上下文信息的同时强化了词语的语义内涵。
     2.提出了一种基于虚拟分类树的多层文本分类方法。针对现有多层分类方法采用自上而下建立分类模型,样本数据被多次重复学习的问题,提出了一种基于虚拟分类树的多层文本分类方法。该分类方法采用了自底向上的方式构建分类器。在自顶向下文本分类过程中,计算经过预处理后的文档向量与关联分类器之间的相似程度,并选择其中的最大值用来确定该文档所属的类别,直到将文档归结到叶子结点。
     3.提出了多层文本分类的增量学习算法。结合单文档调整与新增样本集的学习问题分析,提出了两种模式下基于多层分类模型的增量学习算法:单文档调整通过寻找分类路径与实际路径的最左不匹配结点重新学习并更新虚拟分类树分类模型;新增样本集利用增量特征选择算法增量更新特征空间,并重新计算权值以提升分类模型的准确性。
     4.提出了一种多层文本分类性能评价方法。为了准确评价多层文本分类方法,利用多层分类结构中类别之间的层次关系和“亲疏”关系,提出了一组能够准确描述多层分类性能的扩展评价指标,并利用错误分类样本分布定义了错误分类集中度,在评价分类结果的同时能够指导训练样本的选择过程,使得训练样本更具有代表性。
     5.设计了一种文本信息处理过程模型。针对文本情报处理的应用模式,设计了文本信息处理的过程模型,包括文本信息采集、热点聚合分类、全文信息检索和文本信息综合整编四个阶段。在此基础上,开发了文本信息处理系统,该系统能够实现文本信息的预处理、分析处理和整编处理,为信息工作人员提供软件平台以提升信息处理工作的效率。
The fast growth of information technology and rapid changes of internet havebrought us into an enriched and rapidly updated information age. Especially with theemergence of various social networks in recent years, massive text information has beenproduced and disseminated constantly on the networks every day."Information poverty"has been replaced by "information overload" with the rapid growth of the mass ofinformation. The problem we are facing is no longer how to get information, but how toquickly and efficiently extract the required information from large amount ofinformation. As a key technology of great useful value, to a large extent, textclassification can solve the problem of information mess, and bring convenience forusers to accurately specify their required information and distribute information. Alongwith wide application of classification technology in information retrieval, publicsentiment analysis, information filtering, news classification, digital library and moreother areas, the study on key techniques of text classification has become an advancingfront subject of information processing, and has wide applications prospect andimportant research significance. This dissertation is mainly concerned with textsemantic representation and key techniques of hierarchical classification. The author’smajor contributions are outlined as follows:
     1. A Text Semantic Graph based text representation model is proposed. To solvethe problem of words semantic information loss caused by text representation based onword frequency statistics, a new Chinese text semantic representation model: TextSemantic Graph, is proposed by considering contextual semantic and backgroundinformation of the words in the text. This method captures the semantic relationshipsbetween words using Wikipedia as a knowledge base. Words with strong semanticrelationships are combined into a word-package as indicated by a graph node, whichweighted by the total number and frequency of the words it contains. Contextualrelationship between words in different word-packages is stated by a directed edge,which weighted with the maximum weight of its adjacent nodes. The model retains thecontextual information of each word to a large extent while at the same time thesemantic meaning between words is strengthened.
     2. A virtual category tree based the hierarchical text classification method isproposed. According to the problem of top-down building classification model inexisting hierarchical classification methods and sample data repetitive learning, a newvirtual category tree based the hierarchical text classification method is proposed. The classification method uses a bottom-up approach to build classifiers. It can decrease thecost of sample repetitive learning and reduce sample learning time. In the process oftop-down text classification, the similarity between document vector preprocessed andthe associated classifier is calculated. The maximum value is selected to determine thecategory which the document belongs to until the document is classified to leaf node.
     3. Hierarchical text classification incremental learning algorithms are proposed.Combined with the analysis on learning problems of single document adjustment andnew sample sets, the incremental learning algorithms based on the hierarchicalclassification model for the two patterns are proposed. Towards single documentadjustment, the classifier, which is the extreme left mismatching node between thedocument's classification path and its actual path in the virtual category tree, isretraining and then the virtual category tree model is updated. For new sample sets, thefeature space is updated incrementally using an incremental features selection algorithm.The weights are recalculated to improve the accuracy of classification model.
     4. A hierarchical text classification performance evaluation method is proposed. Toevaluate the hierarchical classification methods, resolve the limitations of conventionalflat classification measures for hierarchical classification evaluation, after studying thehierarchical classification methods based on concept tree, a set of extended measuresare put forward to accurately describe its performance, by effectively using the level and"affinity" among the categories in a hierarchical structure. And further a definition ofError Classification Concentration Ratio (ECCR) is given based on the distribution ofmisclassification samples. Besides evaluation the classification result, ECCR can guidethe training samples selection process to make the training set more representative.
     5. A text information processing model is designed. According to a text intelligenceprocessing application mode, a process model of text information processing is designed,including four stages of text information collection, hotspot aggregation andclassification, full text information retrieval and text information integrated compilation.On this basis, text information processing system is developed. The system can realizethe text information pre-processing, analysis processing and integrated compilation. Itprovides a software platform for information workers to improve the efficiency ofinformation processing.
引文
[1]中国互联网络信息中心.第28次中国互联网络发展状况统计报告.2011.07,pp.4-5.
    [2]Global Information Industry Center. How Much Information?2009Report on American Consumers.2010, pp.8-13.
    [3]Sebastiani Fabrizio. A Tutorial on Automated Text Categorization. Proceedings of the1st Argentinian Symposium on Artificial Intelligence. Buenos Aires, AR.1999, pp.7-35.
    [4]Luhn Hans Peter. Auto-encoding of Documents for Information Retrieval System. Modern Trends in Documentation. New York:Pergamon Press.1959.
    [5]Melvin Earl Maron, John Lary Kuhns. On Relevance, Probabilistic Indexing and Information Retrieval. ACM.1960, pp.216-244.
    [6]Florian Verhein, Sanjay Chawla. Using Significant Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets. Proceedings of the2007Seventh IEEE International Conference on Data Mining. Washington:IEEE Computer Society.2007, pp.679-684.
    [7]Rakesh Gupta, Lev-Arie Ratinov. Text Categorization with Knowledge Transfer from Heterogeneous Data Sources. Proceedings of the23rd National Conference on Artificial Intelligence. California:AAAI Press.2008, pp.842-847.
    [8]Kwan Yi, Jamshid Beheshti. A hidden Markov Model-based Text Classification of Medical Documents. Journal of Information Science.2009,35(1):pp.67-81.
    [9]Nagesh Kapalavayi, S.N.Jayaram Murthy, Gongzhu Hu. Hierarchical Approach to Select Feature Vectors for Classification of Text Documents. Proceedings of the IEEE International Conference on Computer Systems and Applications. Sharja, USA.2006, pp.1180-1183.
    [10]Taeho Jo, Malrey Lee. Kernel based Learning Suitable for Text Categorization. Proceedings of the5th ACIS International Conference on Software Engineering Research, Management and Applications. Washington:IEEE Computer Society.2007, pp.289-292.
    [11]Makoto Suzuki, Shigeichi Hirasawa. Text Categorization Based on the Ratio of Word Frequency in Each Categories. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. Montreal, QC, Canada.2007, pp.3535-3540.
    [12] Alex K.S. Wong, John W.T. Lee. An Evolutionary Approach for DiscoveringEffective Composite Features for Text Categorization. Proceedings of the IEEEInternational Conference on Systems, Man and Cybernetics. Montreal, QC,Canada.2007, pp.3045-3050.
    [13] Guy Lebanon, Yi Mao, Joshua Dillon. The Locally Weighted Bag of WordsFramework for Document Representation. Journal of Machine Learning Research.2007,8(12), pp.2405-2441.
    [14] Anand Sharma, Anthony Kuh. Class Document Frequency as a Learned Featurefor Text Categorization. Proceedings of the International Joint Conference onNeural Network. Hong Kong, China.2008, pp.2988-2993.
    [15] Makoto Suzuki. Text Categorization using the Maximum Ratio of Term Frequency.Journal of Japan Industrial Management Association.2008,58(6): pp.438-444.
    [16] Thorsten Joachims. Text Categorization with Support Vector Machines: Learningwith Many Relevant Features. Proceedings of the10th European Conference onMachine Learning. Chemnitz, Germany.1998, pp.137-142.
    [17] Yiming Yang, Tom Pierce, Jaime Carbonell. A Study of Retrospective and On-lineEvent Detection. Proceedings of the21st Annual International ACM SIGIRConference on Research and Development in Information Retrieval. New York:ACM Press.1998, pp.28-36.
    [18] Andrew Kachites McCallum. Multi-label Text Classification with a MixtureModel Trained by EM. Proceedings of the AAAI-99Workshop on Text Mining.Orlando, Florida.1999, pp.1-7.
    [19] Kalyan Moy Gupta, Philip G. Moore, David W. Aha, et al. Rough Set FeatureSelection Methods for Case-Based Categorization of Text Documents.Proceedings of the1st International Conference on Pattern Recognition andMachine Intelligence. Heidelberg: Springer-Verlag.2005, pp.792-798.
    [20] Laurence Hirsch, Masoud Saeedi, Robin Hirsch. Evolving Text ClassificationRules with Genetic Programming. Applied Artificial Intelligence.2005,7(19):pp.659-676.
    [21] Jorge Civera, Elsa Cubel, Alfons Juan, et al. Different Approaches to BilingualText Classification. Based on Grammatical Inference Techniques. Proceedings ofthe2nd Iberian Conference on Pattern Recognition and Image Analysis. Estoril,Portugal.2005, pp.630-637.
    [22] AJC Trappey, SCI Lin, ACL Wang. Using Neural Network Categorization Methodto Develop an Innovative Knowledge Management Technology for PatentDocument Classification. Proceedings of the9th International Conference onComputer Supported Cooperative Work in Design.2005, pp.830-835.
    [23] David A. Bell, J. W. Guan, Yaxin Bi. On Combining Classifier Mass Functions forText Categorization. IEEE Transaction on Knowledge and Data Engineering.2005,17(10): pp.1307-1319.
    [24] Almonayyes, A. Categorizing Fanatic Texts by Integrating Explanation Patternswith Naive Bayes Classifier. Proceedings of2005International Conference onNeural Networks and Brain. Beijing, China.2005, pp.1279-1283.
    [25] Hiroshi Uejima, Takao Miura, Isamu Shioya. Improving Text Categorization byResolving Semantic Ambiguity. Systems and Computers in Japan.2005,36(4):pp.1-8.
    [26] Yun Jeong Choi, Seung Soo Park. Refinement Method of Post-processing andTraining for Improvement of Automated Text Classification. Proceedings ofInternational Conference on Computational Science and Its Application. Glasgow,United Kingdom.2006, pp.298-308.
    [27] Takahiro Yamada, Kyohei Yamashita, Naohiro Ishii. Text Classification byCombining Different Distance Functions with Weights. Proceedings of the7thACIS International Conference on Software Engineering, Artificial Intelligence,Networking, and Parallel/Distributed Computing. Las Vegas, NV, United States.2006, pp.85-90.
    [28] Youngsoo Kim, Taekyong Nam, Dongho Won.2-Way Text Classification forHarmful Web Documents. Proceedings of International Conference onComputational Science and Its Application. Glasgow, United Kingdom.2006,pp.545-551.
    [29] G.E. Hinton, R. R. Salakhutdinov. Reducing the Dimensionality of Data withNeural Networks. Science.2006,313(5786): pp.504-507.
    [30] Cornelis HA Koster, Jean G. Beney. On the Importance of Parameter Tuning inText Categorization. Proceedings of the6th International Andrei Ershov MemorialConference on Perspectives of Systems Informatics. Novosibirsk, Russia.2007,pp.270-283.
    [31]Yongwook Yoon, Gary G. Lee. Text Categorization Based on Boosting Association Rules. Proceedings of the2nd Annual IEEE International Conference on Semantic Computing. Washington:IEEE Computer Society.2008, pp.136-143.
    [32]Anastasia Krithara, Massih R. Amini, Jean-michel Renders, et al. Semi-supervised Document Classification with a Mislabeling Error Model. Proceedings of the30th European Conference on Advances in Information Retrieval. Heidelberg: Springer-Verlag.2008, pp.370-381.
    [33]Olivier Chapelle, Vikas Sindhwani, Sathiya S. Keerthi. Optimization Techniques for Semi-Supervised Support Vector Machines. Journal of Machine Learning Research.2008,9(2):pp.203-233.
    [34]Dino Isa, Lam Hong Lee, V.P. Kallimani et al. Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine. IEEE Transactions on Knowledge and Data Engineering.2008,20(9):pp.1264-1271.
    [35]侯汉清.分类法的发展趋势简论.情报科学.1981(1):pp.58-63.
    [36]肖明,沈英.自动分类研究进展.现代图书情报技术.2000(5):pp.25-28.
    [37]陈文亮,朱靖波,朱慕华等.基于领域词典的文本特征表示.计算机研究与发展.2005,42(12):pp.2155-2160.
    [38]唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术.计算机研究与发展.2005,42(1):pp.47-53.
    [39]姚力群,陶卿.局部线性与One-Class结合的科技文本分类方法.计算机研究与发展.2005,42(11):pp.1862-1869.
    [40]尚文倩,黄厚宽,刘玉玲等.文本分类中基于基尼指数的特征选择算法研究.计算机研究与发展.2006,43(10):pp.1688-1694.
    [41]陈晓云,陈袆,王雷等.基于分类规则树的频繁模式文本分类.软件学报.2006,17(5):pp.1017-1025.
    [42]王强,关毅,王晓龙.基于特征类别属性分析的文本分类器分类噪声裁剪方法.自动化学报.2007,33(8):pp.809-816.
    [43]朱靖波,王会珍,张希娟.面向文本分类的混淆类判别技术.软件学报.2008,19(3):pp.630-639.
    [44]李文波,孙乐,张大鲲.基于Labeled-LDA模型的文本分类新算法.计算机学报.2008,31(4):pp.620-627.
    [45]李荣陆,王建会,陈晓云等.使用最大熵模型进行中文文本分类.计算机研究与发展.2005,42(1):pp.94-101.
    [46]王建会,王洪伟,申展等.一种实用高效的文本分类算法.计算机研究与发展.2005,42(1):pp.85-93.
    [47]樊兴华,孙茂松.一种高性能的两类中文文档分类方法.计算机学报.2006,29(1):pp.124-131.
    [48]姜远,周志华.基于词频分类器集成的文本分类方法.计算机研究与发展.2006,43(10):pp.1681-1687.
    [49]唐华,曾碧卿.基于遗传算法和信息熵的文本分类规则抽取方法研究.中山大学学报(自然科学版).2007,46(5):pp.18-21.
    [50]郝秀兰,陶晓鹏,徐和祥等.kNN文本分类器类偏斜问题的一种处理对策.计算机研究与发展.2009,46(1):pp.52-61.
    [51]鲁明羽,李凡,庞淑英等.基于权值调整的文本分类改进方法.清华大学学报(自然科学版).2003,43(4):pp.181-184.
    [52]李晓黎,刘继敏,史忠植.概念推理网及其在文本分类中的应用.计算机研究与发展.2000,37(9):pp.1032-1038.
    [53]胡泽文,王效岳,白如江.国内外文本分类研究计量分析与综述.图书情报工作.2011,55(6):pp.78-81.
    [54]苏金树,张博锋,徐听.基于机器学习的文本分类技术研究进展.软件学报.2006,17(9):pp.1848-1859.
    [55]Cover TM, Hart PE. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory.1967,13(1):pp.21-27.
    [56]Susana Eyheramendy, David D. Lewis, David Madigan. On the Naive Bayes Model for Text Categorization. Proceedings of the9th International Workshop on Artificial Intelligence and Statistics. Key West, Florida.2003, pp.332-339.
    [57]David Lewis, Robert E. Schapire, James P. Callan, et al. Training Algorithms for Linear Text Classifiers. Proceedings of the19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press.1996, pp.298-306.
    [58]Adam Berger. Error-Correcting Output Coding for Text Classification. Proceedings of IJCAI-99Workshop on Machine Learning for Information Filtering. Stockholm, Sweden.1999, pp.17-24.
    [59] Rayid Ghani. Using Error-Correcting Codes for Text Classification. Proceedingsof the17th International Conference on Machine Learning. San Francisco:Morgan Kaufmann Publishers.2000, pp.303-310.
    [60] Platt J, Cristianini N, Shawe-Taylor J. Large Margin DAGs for MulticlassClassification. Advances in Neural Information Processing Systems. Cambridge:MIT Press.2000, pp.547-553.
    [61] Liu Jinbai, Xu Lihong, Fei Ben. Binary Tree of Support Vector Machine in TextureClassification Problem. Proceedings of the IASTED International Conference onCircuits, Signals and Systems. Calgary: ACTA Press.2004, pp.284-288.
    [62] Chakrabarti S, Roy S, Soundalgekar MV. Fast and Accurate Text Classification viaMultiple Linear Discriminant Projections. The International Journal on Very LargeData Bases.2003,12(2): pp.170-185.
    [63] Wu H, Phang TH, Liu B, et al. A Refinement Approach to Handling Model Misfitin Text Categorization. Proceedings of the8th ACM International Conference onKnowledge Discovery and Data Mining. Edmonton: ACM Press.2002,pp.207-216.
    [64] Tan SB, Cheng XQ, Wang B, et al. Using Dragpushing to Refine Centroid TextClassifiers. Proceedings of the Annual ACM Conference on Research andDevelopment in Information Retrieval. New York: ACM Press,2005, pp.653-654.
    [65] Debole F, Sebastiani F. An Analysis of the Relative Hardness of Reuters-21578Subsets. Journal of the American Society for Information Science and Technology.2004,56(6): pp.584-596.
    [66] Yang YM, Liu X. A Re-examination of Text Categorization Methods. Proceedingsof the Annual ACM Conference on Research and Development in InformationRetrieval. New York: ACM Press,1999, pp.42-49.
    [67] Lewis DD, Yang Y, Rose T, et al. RCV1: A New Benchmark Collection for TextCategorization Research. Journal of Machine Learning Research.2004,5:pp.361-397.
    [68] Forman G, Cohen I. Learning from Little: Comparison of Classifiers Given LittleTraining. Proceedings of the8th European Conference on Principles of DataMining and Knowledge Discovery. Heidelberg: Springer-Verlag.2004,pp.161-172.
    [69]Leo Breiman, Jerome Friedman, Charles J Stone, et al. Classification and Regression Trees. London:Chapman&Hall/CRC.1984.
    [70]Quinlan JR. Induction of Decision Trees. Machine Learning1. Boston:Kluwer Academic Publishers.1986, pp.81-106.
    [71]Quinlan JR. C4.5:Programs for Machine Learning. San Francisco:Morgan Kaufmann Publishers,1993.
    [72]Pawlak Z. Rough Set. International Journal of Computer and Information Science.1982,11:pp.341-356.
    [73]Yuhua Li, David Mclean, Zuhair A. Bandar, et al. Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering.2006,18(8):pp.1138-1150.
    [74]Adam Schenker, Mark Last, Horst Bunke, et al. Classification of Web Documents Using a Graph Model. Proceedings of the7th International Conference on Document Analysis and Recognition. Washington:IEEE Computer Society.2003,1:pp.240-244.
    [75]吴江宁,刘巧凤.基于图结构的中文文本表示方法研究.情报学报.2010,29(4):pp.618-624.
    [76]Manuel Montes-y-Gomez, Aurelio Lopez-Lopez, Alexander Gelbukh. Information Retrieval with Conceptual Graph Matching. Proceedings of the11th International Conference on Database and Expert Systems Applications. London: Springer-Verlag.2000,1873:pp.312-321.
    [77]Bhoopesh Choudhary, Pushpak Bhattacharyya. Text Clustering using Semantics. Proceedings of the11th International Conference on World Wide Web. New York: ACM Press.2002,79.
    [78]Svetlana Hensman. Construction of Conceptual Graph Representation of Texts. Proceedings of the Student Research Workshop at HLT-NAACL2004. Stroudsburg:Association for Computational Linguistics.2004, pp.49-54.
    [79]Wei Song, Soon Cheol Park. A Novel Document Clustering Model Based on Latent Semantic Analysis. Proceedings of the3rd International Conference on Semantics, Knowledge and Grid. Washington:IEEE Computer Society.2007, pp.539-542.
    [80] Chang-Shing Lee, Yuan-Fang Kao, Yau-Hwang Kuo, et al. Automated OntologyConstruction for Unstructured Text Documents. Data&Knowledge Engineering.2007,60(3): pp.547-566.
    [81] Anna Stavrianou, Periklis Andritsos, Nicolas Nicoloyannis. Overview andSemantic Issues of Text Mining. ACM SIGMOD Record.2007,36(3): pp.23-34.
    [82] Wei Jin, Rohini K. Srihari. Graph-based Text Representation and KnowledgeDiscovery. Proceedings of the2007ACM Symposium on Applied Computing.New York: ACM Press.2007, pp.807-811.
    [83] Ming-Wei Chang, Lev Ratinov, Dan Roth, et al. Importance of SemanticRepresentation: Dataless Classification. Proceedings of the23rd AAAIConference on Artificial Intelligence. California: AAAI Press.2008, pp.830-835.
    [84] Evgeniy Gabrilovich, Shaul Markovitch. Computing Semantic Relatedness usingWikipedia-based Explicit Semantic Analysis. Proceedings of the20thInternational Joint Conference for Artificial Intelligence. California: AAAI Press.2007, pp.1606-1611.
    [85] Yanjun Li, Soon M. Chung, John D. Holt. Text Document Clustering Based onFrequent Word Meaning Sequences. Data&Knowledge Engineering.2008,64(1):pp.381-404.
    [86] Khaled Shaban. A Semantic Approach for Document Clustering. Journal ofSoftware.2009,4(5): pp.391-404.
    [87] Walaa K. Gad, Mohamed S. Kamel. New Semantic similarity Based Model forText Clustering Using Extended Gloss Overlaps. Proceedings of the6thInternational Conference on Machine Learning and Data Mining in PatternRecognition. Berlin: Springer-Verlag.2009, pp.663-677.
    [88] Jianyi Liu, Jinghua Wang, Cong Wang. Research on Text Network Representation.Proceedings of IEEE International Conference on Networking, Sensing andControl. Washington: IEEE Computer Society.2008, pp.1217-1221.
    [89] Helen J. Peat, Peter Willett. The Limitations of Term Co-Occurrence Data forQuery Expansion in Document Retrieval Systems. Journal of the AmericanSociety for Information Science.1991,42(5): pp.378-383.
    [90] Lillian Lee. Measures of Distributional Similarity. Proceedings of the37th annualmeeting of the Association for Computational Linguistics on Computational Linguistics. Stroudsburg:Association for Computational Linguistics.1999, pp.25-32.
    [91]Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, et al. Word-Sense Disambiguation Using Statistical Methods. Proceedings of the29th annual meeting on Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics.1991, pp.264-270.
    [92]Ido Dagan, Lillian Lee, Fernando Pereira. Similarity-Based Models of Word Cooccurrence Probabilities. Machine Learning:Special issue on Machine Learning and Natural Language.1999,34(1-3):pp.43-69.
    [93]胡俊峰,俞士汶.唐宋诗中词汇语义相似度的统计分析及应用.中文信息学报.2002,16(4):pp.39-44.
    [94]章志凌,虞立群,陈奕秋等.基于Corpus库的词语相似度计算方法.计算机应用.2006,26(3):pp.638-644.
    [95]George A. Miller, Richard Beckwith, Christiane Fellbaum, et al. WordNet: An on-line Lexical Database. International Journal of Lexicography.1990,3(4): pp.235-244.
    [96]Robert A Dutch, Peter Mark Roget. The Original Roget's Thesaurus of English Words and Phrases. New York: St. Martin's Press.1966.
    [97]梅家驹,竺一鸣,高蕴琦等.同义词词林.上海:上海辞书出版社.1983.
    [98]董振东,董强.知网和汉语研究.当代语言学.2001,3(1):pp.33-44.
    [99]裘江南,罗志成,王延章.基于中文语义词典的语义相关度方法比较研究.情报理论与实践.2008,31(5):pp.715-719.
    [100]R. Rada, H. Mili, E. Bicknell, et al. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics.1989,19(1):pp.17-30.
    [101]Joon Ho Lee, Myoung Ho Kim, Yoon Joon Lee. Information Retrieval Based on Conceptual Distance in Is-A Hierarchies. Journal of Documentation.1993,49(2): pp.188-207.
    [102]Philip Resnik. Semantic Similarity in a Taxonomy:An Information-Based. Measure and its Application to Problems of Ambiguity in Natural Language. Journal on Artificial Intelligence Research.1999,11:pp.95-130.
    [103]王斌.汉英双语语料库自动对齐研究.北京:中国科学院计算技术研究所.1999.
    [104],.. Computational Linguisticsand Chinese Language Processing.2002,7(2): pp.59-76.
    [105] Evgeniy Gabrilovich, Shaul Markovitch. Computing Semantic Relatedness usingWikipedia-based Explicit Semantic Analysis. Proceedings of the20thInternational Joint Conference on Artificial Intelligence. San Francisco: MorganKaufmann Publishers.2007, pp.1606-1611.
    [106] Stephen D'Alessio, Keitha Murray, Robert Schiaffino, et al. The Effect of UsingHierarchical Classifiers in Text Categorization. Proceedings of the6thInternational Conference Recherche d'Information Assistee par Ordinateur. Paris,FR.2000, pp.302-313.
    [107] Yiming Yang. An Evaluation of Statistical Approaches to Text Categorization.Journal of Information Retrieval.1999,1(1/2): pp.69-90.
    [108] Susan Dumais, Hao Chen. Hierarchical Classification of Web Content.Proceedings of the23rd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval. New York: ACM Press.2000, pp.256-263.
    [109] Ke Wang, Senqiang Zhou, Yu He. Hierarchical Classification of Real LifeDocuments. Proceedings of the1st SIAM International Conference on DataMining. Chicago, United States.2001.
    [110] Andreas S. Weigend, Erik D. Wiener, Jan O. Pedersen. Exploiting Hierarchy inText Categorization. Information Retrieval.1999,1(3): pp.193-216.
    [111] Daphne Koller, Mehran Sahami. Hierarchically Classifying Documents UsingVery Few Words. Proceedings of the14th International Conference on MachineLearning. San Francisco: Morgan Kaufmann Publishers.1997, pp.170-178.
    [112] Andrew McCallum, Ronald Rosenfeld, Tom M. Mitchell, et al. Improving TextClassification by Shrinkage in a Hierarchy of Classes. Proceedings of the15thInternational Conference on Machine Learning. San Francisco: MorganKaufmann Publishers.1998, pp.359-367.
    [113] Aixin Sun, Ee-Peng Lim, Wee-Keong Ng. Performance Measurement Frameworkfor Hierarchical Text Classification. Journal of the American Society forInformation Science and Technology.2003,54(11): pp.1014-1028.
    [114] Minoru Sasaki, Kenji Kita. Rule-Based Text Categorization Using HierarchicalCategories. Proceedings of the IEEE International Conference on Systems, Man,and Cybernetics. San Diego, CA.1998, pp.2827-2830.
    [115] Dunja Mladeniéc, Marko Grobelnik. Feature Selection for Classification Based onText Hierarchy. Proceedings of the Conference on Automated Learning andDiscovery.1998.
    [116] Kristina Toutanova, Francine Chen, Kris Popat, et al. Text Classification in aHierarchical Mixture Model for Small Training Sets. Proceedings of the10thInternational Conference on Information and Knowledge Management. New York:ACM Press.2001, pp.105-113.
    [117] TieYan Liu, Yiming Yang, Hao Wan, et al. Support Vector MachinesClassification with A Very Large-scale Taxonomy. ACM SIGKDD ExplorationsNewsletter-Natural Language Processing and Text Mining.2005,7(1): pp.36-43.
    [118] Aixin Sun, Ee-Peng Lim. Hierarchical Text Classification and Evaluation.Proceedings of the2001IEEE International Conference on Data Mining.Washington: IEEE Computer Society.2001, pp.521-528.
    [119] Lijuan Cai, Thomas Hofmann. Hierarchical Document Categorization withSupport Vector Machines. Proceedings of the13th ACM International Conferenceon Information and Knowledge Management. New York: ACM Press.2004,pp.78-87.
    [120] Nicol`o Cesa-Bianchi, Claudio Gentile, Luca Zaniboni. Incremental Algorithmsfor Hierarchical Classification. Journal of Machine Learning Research.2006,7:pp.31-54.
    [121] Korinna Bade, Eyke Hullermeierm, Andreas Nurnberger. HierarchicalClassification by Expected Utility Maximization. Proceedings of the6thInternational Conference on Data Mining. Washington: IEEE Computer Society.2006, pp.43-52.
    [122] Miguel E. Ruiz, Padmini Srinivasan. Hierarchical Text Categorization UsingNeural Networks. Information Retrieval.2002,5(1): pp.87-118.
    [123] Juho Rousu, Craig Saunders, Sandor Szedmak, et al. Learning HierarchicalMulti-Category Text Classification Models. Proceedings of the22nd InternationalConference on Machine Learning. New York: ACM Press.2005, pp.744-751.
    [124]Nicholas Holden, Alex A Freitas. Improving the Performance of Hierarchical Classification with Swarm Intelligence. Proceedings of the6th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Heidelberg:Springer-Verlag.2008, PP.48-60.
    [125]Andrea Esuli, Tiziano Fagni, Fabrizio Sebastiani. Boosting Multi-Label Hierarchical Text Categorization. Information Retrieval.2008,11(4):pp.287-313.
    [126]Xiaojun Quan, Lin Yanggang, Luo Qiming, et al. Hierarchical Text Categorization with Probabilistic Topics. Journal of University of Science and Technology of China.2009,39(8):pp.875-879.
    [127]古平,朱庆生,张程等.一种融合本体和上下文的自适应层次分类模型.北京理工大学学报.2009,29(10):pp.885-889.
    [128]J. Diez, J. J. del Coz, A. Bahamonde. A Semi-dependent Decomposition Approach to Learn Hierarchical Classifiers. Pattern Recognition.2010,43(11): pp.3795-3804.
    [129]Min-Hsuan Tsai, Shen-Fu Tsai, Thomas S. Huang. Hierarchical Image Feature Extraction And Classification. Proceedings of the International Conference on Multimedia. New York: ACM Press.2010, pp.1007-1010.
    [130]Carlos N Silla, Alex A Freitas. Novel Top-Down Approaches for Hierarchical. Classification and Their Application to Automatic Music Genre Classification. Proceedings of the2009IEEE International Conference on Systems, Man and Cybernetics. Piscataway:IEEE Press.2009, pp.3499-3504.
    [131]Kunal Punera, Joydeep Ghosh. Enhanced Hierarchical Classification via Isotonic Smoothing. Proceeding of the17th International Conference on World Wide Web. New York: ACM Press.2008, pp.151-160.
    [132]Nam Nguyen. Improving Hierarchical Classification with Partial Labels. Proceeding of the19th European Conference on Artificial Intelligence. Amsterdam: IOS Press.2010, pp.315-320.
    [133]Bin Gao, Tie-Yan Liu, Guang Feng, et al. Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning. IEEE Transactions on Knowledge and Data Engineering.2005,17(9): pp.1263-1273.
    [134]Tao Li, Shenghuo Zhu, Mitsunori Ogihara. Hierarchical Document Classification Using. Automatically Generated Hierarchy. Journal of Intelligent Information Systems.2007,29(2):pp.211-230.
    [135]Kunal Punera, Suju Rajan, Joydeep Ghosh. Automatic Construction of N-ary Tree Based Taxonomies. Proceedings of the6th IEEE International Conference on Data Mining. Washington:IEEE Computer Society.2006, pp.75-79.
    [136]Lei Tang, Jianping Zhang, Huan Liu. Acclimatizing Taxonomic Semantics for Hierarchical Content Classification. Proceedings of the12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press.2006,384-393.
    [137]Lei Tang, Huan Liu, Jianping Zhang, et al. Topic Taxonomy Adaptation for Group Profiling. ACM Transactions on Knowledge Discovery from Data.2008,1(4): pp.1-26.
    [138]Arthur Zimek, Fabian Buchwald, Eibe Frank, et al. A Study of Hierarchical and Flat Classification of Proteins. IEEE/ACM Transactions on Computational Biology and Bioinformatics.2010,7(3):pp.563-571.
    [139]Joel Ratsaby. Incremental Learning with Sample Queries. IEEE Transactions on Pattern Analysis and Machine Intelligence. Washington:IEEE Computer Society.1998,20(8):883-888.
    [140]K Yamauchi, N Yamaguchi, N Ishii. Incremental Learning Methods with Retrieving Interfered Patterns. IEEE Transactions on Neural Networks.1999,10(6):pp.1351-1365.
    [141]陶品,张钹,叶榛.构造型神经网络双交叉覆盖增量学习算法.软件学报.2003,14(2):pp.194-201.
    [142]Nadeem Ahmed Syed, Huan Liu,Kah Kay Sung. Handling Concept Drifts in Incremental Learning with Support Vector Machines. Proceedings of the5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press.1999, pp.317-321.
    [143]Stefan Riiping. Incremental Learning with Support Vector Machines. Proceedings of the2001IEEE International Conference on Data Mining. Washington:IEEE Computer Society.2001, pp.641-642.
    [144]Gert Cauwenberghs, Tomaso Poggio. Incremental and Decremental Support Vector Machine Learning. Proceedings of the13th Neural Information Processing Systems. Cambridge:MIT Press.2000, pp.409-415.
    [145]王飞,刘大有,王淞听.基于遗传算法的Bayesian网结构增量学习的研究.计算机研究与发展.2005,42(9):pp.1461-1466.
    [146]萧嵘,王继成,孙正兴等.一种SVM增量学习算法a-ISVM软件学报.2001,12(12):pp.1818-1824.
    [147]Yannis Labrou, Tim Finin. Yahoo! As an Ontology-Using Yahoo! Categories to Describe Documents. Proceedings of the8th International Conference on Information and Knowledge Management.1999, pp.180-187.
    [148]Dunja Mladeni. Turning Yahoo to Automatic Web-Page Classifier. Proceedings of the13th European Conference on Artificial Intelligence. Brighton, UK.1998, pp.473-474.
    [149]David D Lewis. An Evaluation of Phrasal and Clustered. Representations on a Text Categorization Task. Proceedings of the15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press.1992, pp.37-50.
    [150]Fabrizio Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys.2002,34(1):pp.1-47.
    [151]Tom Fawcett. ROC Graphs:Notes and Practical Considerations for Researchers. HP Labs Tech Report. Netherlands:Kluwer Academic Publishers.2004,31(HPL-2003-4):pp.1-38.
    [152]Fan Li, Yiming Yang. A Loss Function Analysis for Classification Methods in Text Categorization. Proceedings of the20th International Conference on Machine Learning. Washington:AAAI Press.2003, pp.472-479.
    [153]Yiming Yang, Jian Zhang, Bryan Kisiel. A Scalability Analysis of Classifiers in Text Categorization. Proceedings of the26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press.2003, pp.96-103.
    [154]Shengli Song, Zengxin Guo, Ping Chen. Fuzzy Document Clustering using Weighted Conceptual Model. Information Technology Journal.2011,10(6): pp.1178-1185.
    [155]宋胜利,王少龙,陈平.面向文本分类的中文文本语义表示方法.西安电子科技大学学报(自然科学版).2012.
    [156]Shengli Song, Xiaofei Qiao, Ping Chen. Hierarchical Text Classification Incremental Learning. Proceedings of the16th International Conference on Neural Information Processing. Heidelberg:Springer-Verlag.2009, pp.247-258.
    [157]冯佳,宋胜利,陈平.一种新的SVM多层增量学习方法HISVML.微电子学与计算机.2009,26(5):pp.216-222.
    [158]宋胜利,鲍亮,陈平.多层文本分类性能评价方法.系统工程与电子技术.2010,32(5):pp.1088-1093.
    [159]王荔,宋胜利,陈平.一种全切分与统计结合的分词系统.微电子学与计算机.2009,26(5):pp.68-70.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700