频繁子结构挖掘算法研究与应用

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

频繁子结构挖掘算法研究与应用

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research and Application on Frequent Substructure Mining Algorithms
作者：李海波
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：频繁子结构挖掘 ; PJE算法 ; Root-PJE算法 ; Free-PJE算法 ; Graph-PJE算法 ; 图核-分支-连接向量三元组 ; 频繁子图索引
英文关键词：frequent substructure mining ; PJE Algorithm ; Root-PJE Algorithm ; Free-PJE Algorithm ; Graph-PJE Algorithm ; 3-tuple of core-branch-connction-vector ; frequent subgraph index
学位年度：2011
导师：王元珍
学科代码：081202
学位授予单位：华中科技大学
论文提交日期：2010-12-01

摘要

在新兴的化学信息学、生物信息学,网络分析、XML数据等领域,需要用树或图这样的结构化数据类型来表示数据。在这些结构化数据类型中进行数据挖掘,将有助于我们获取新的信息和知识。在结构化类型的数据集合中,频繁项的挖掘是一种最基本的数据挖掘方式,如何高效地挖掘频繁子结构模式,是一个挑战性的问题。
     已有的高效频繁子结构挖掘算法的核心思想可以大致分为基于Apriori原则的连接方法和基于模式增长的扩展方法,但前者难以直接生成规范化的候选模式,后者又容易产生候选模式的数目过大。通过分析这两类方法的特点,提出了混合型PJE方法,该方法是研究频繁有根无序树挖掘、频繁自由树挖掘和频繁子图挖掘的基础。
     在频繁有根无序树的挖掘中,采用最小深度序列作为规范化标记形式,并且基于前缀结点进行扩展,在常数时间内得到新的规范化形式的候选模式树。采用深度扩展和广度连接的混合方式列举候选模式树,利用Apriori原则减少候选模式树的数目。对列举生成的候选模式树,利用Apriori原则进行剪枝,进一步减少需要进行频度统计的候选模式树数目。用规范化的嵌入出现列表表示模式树在数据库中的出现,在此基础上进行出现频度统计,不仅避免完整的子图同构判断问题,而且比使用完整出现列表节约了大量空间。综合以上技术,给出了频繁有根无序树挖掘算法Root-PJE,并且在人工数据集和真实数据集上进行性能测试,验证了性能比现有算法有较大提高。
     在频繁自由树的挖掘中,定义自由树的中心结点或双中心结点,将自由树转换为以中心结点为根的有根无序树。基于自由树的脊柱路径和最小脊柱串,定义自由树的脊柱串优先最小深度序列,在此基础上运用前缀结点进行深度扩展和广度连接,在常数时间内得到新的候选模式自由树。对候选模式自由树采用Apriori原理进行剪枝,并采用规范化嵌入出现列表进行频度统计。综合以上方法,给出频繁自由树挖掘算法Free-PJE,并且在人工数据集和真实数据集上进行性能测试,验证了性能比现有算法有较大提高。
     在频繁子图的挖掘中,将图分解为不包含叶结点的图核部分和不包含环的分支森林部分,定义分支森林在图核上的连接向量。由此定义最小“图核-分支-连接向量”三元组作为图的规范化标记形式。以扩展方法得到频繁模式图核,对一个图核由列举得到所有最小连接向量,由此将图看做是虚拟有根无序树,在此虚拟树上进行基于前缀结点的深度扩展和广度连接,从而在常数时间内得到新的候选模式图。采用基于Apriori原理的剪枝和基于规范化嵌入出现列表的出现频度统计。基于以上方法,给出频繁子图挖掘算法Graph-PJE。在人工数据集和真实数据集上进行了性能测试,验证了性能比现有算法有较大提高。
     为了提高图查询的效率,需要在图数据库中建立图索引。利用图数据库中的特征子图和其事务出现列表建立图索引。查询时,首先利用图索引得到查询图的候选查询结果集,然后验证每个候选结果图是否完整包含查询图。使用频繁子图挖掘结果作为图索引,可以保证候选查询结果集不大于频繁挖掘中的最小支持度。使用共享前缀树保存索引特征子图,只需保存有效事务出现列表,可以减小图索引的大小。在真实的分子结构图数据库中,将6边环和5边环看做虚拟原子,对分子结构图进行重构后建立图索引,可以大幅减小图索引的大小。利用真实数据集进行测试,验证了频繁子图索引的高效。
     利用新提出的频繁子结构索引和查询方法,以达梦关系数据库管理系统为平台,设计并实现了化学数据库系统的原型。在该数据库中,利用关系表存储化学结构数据和化学结构索引,利用外部存储过程,实现了化学结构数据的存储、索引、查询以及挖掘功能。
In the field of chemical information, biloloy information, web analysis, XML data emerging recently, structural data types, such as trees and graphs, are needed to reprent infomation. Data mining in these structural data types, is helpful to gain information and knowledge. Mining frequent itemset is one most primary method of data mining. How to efficiently mine out the frequent substructure patterns in the data set of structural type data, is a challenging problem.
     By analyzing of existing efficient algorithm for mining frequent substructures, we found that the core idea of these algorithms can be divided into the joining method based the Apriori principle and the extension method based on pattern growth. By analyzing and absorbing the two methods, we proposed a new hybrid PJE method. Applied PJE to different types of structured data for mining frequent patterns, we propose new efficient algorithms for mining frequent substructure. Finally we proposed a new method for substructure indexing, and implemented a chemical structure database prototype system in a relational DBMS.
     During minig frequent rooted unordered trees, minimum depth first sequences are used as canonical forms of pattern trees. By extending based on prefix nodes of trees, new candidated pattern trees can be gained in const time. Enumerating candidated pattern trees by depth extending and width joining, the Apriori rule can be utilized to reduce the number of candidated pattern trees. For the pattern trees been enumerated, the Apriori rule can still be used to prune, so the number of candidated pattern trees can be cut down further. Using canonical embedded occurring list to record the occurrence of pattern trees in graph databases, and from it to count the frequency of occurring of pattern trees, will not only avoid complete subgrah isomorphism testing, but also save large memory. Putting all discussed above together, a frequent rooted unorderd trees mining algorithm, called PJE, is proposed. By performance testing in both synthetic dataset and real data set, it is proved that the efficiency of substructure data minig has been improved in some degree.
     In the minig of frequent free trees, centre nodes and bicentre nodes of free trees are defined first. Free trees can be transformed into rooted unordered trees by using the centre of bicentre nodes as root nodes. By finding out backbone path and backbone string in free trees, minimum backbone-first depth-first sequences of free trees are defined. This kind of sequences are used as canonical forms of free trees. Based on this type of canonical forms, depth extending and width joining using prefex nodes can be done on the pattern free trees. New candidated pattern free trees can be acquired in const time by the mthod. For the pattern free trees been enumerated, the Apriori rule can still be used to prune, and the frequency of occurring of pattern free trees can counted by using canonical embedded occurring list. Base on all methods discussed above, a frequent free trees mining algorithm, Free-PJE, is proposed. By performance testing in both synthetic dataset and real data set, it is proved that the efficiency of substructure data minig has been improved in great degree.
     In the mining of frequent subgraphs, a graph is devided into a graph core which not containing leaf nodes, and a branch forest which not containing ring structures. The connction vector specifys the nodes where the branch subtrees are conneted on the graph core. The minimum“core-branch-conection vector”3-tuples are used as the canonical forms of graphs. Frequent pattern graph core are gained by patten extending , and all minimum connection vectors are enumerated at the same time. By a specified graph core and it’s minimum connection vectors, graphs can be considered as a virtual rooted unordered trees. depth extending and width joining using prefex nodes can be done on the virtual trees, and new candidated pattern graphs can be gained in constant time. Prune based on the Apriori rule and and counting of frequency of occurring using canonical embedded occurring list can be done. A frequent subgraph mining algorithm, Graph-PJE, is proposed. By performance testing on both synthetic dataset and real data set, it is proved that the efficiency of substructure data minig has been improved greatly.
     To improve the response time of graph query, it’s needful to build graph index in graph database. Feature subgraphs in the graph database and their transaction occurring list can be used to build graph index. Graph query acquired candidated query result set by graph index and query graph, then each of the candidated result graphs is verified whether contains the query graph completely. The result of frequent subgraphs mining can be used as graph index, and the number of the candidated query result set can be preserved less than the minimum support in the frequent subgraphs mining. Common prefix index tree and effective transaction occurring list can be used to reduce the size of the frequent subgraph index. In real mocular structure databases, 6-rings and 5-rings can be regarded as virtual atoms. After the mocular structure database been reconstructed, the size of the frequent subgraph index can be reduced further greatly. By experiments on real data sets, it’s proved that the frequent subgraph indexes have high performance.
     Using the new frequent substructure indexing and querying methods, we designed and implemented a chemical database prototype system in DM DBMS. In DM relational database, the chemical structure data and chemical structure index data has been stored by using relational tables; the functions to store, index, query chemical structure of data has been implemented by using external stored procedures.

引文

[1] Strug B., Slusarczyk G. Reasoning about Designs through Frequent Patterns Mining. Advanced Engineering Informatics, 2009(23): 361~369
    [2] Washio T., Raedt L. D., Kok J. N. Advances in Mining Graphs, Trees and Sequences. Fundamenta Informaticae, 2005(66): v-viii
    [3] Lippi M. Statistical Learning for Relational and Structured Data: Doctor Thesis. Florence, Italy: University of Florence, 2010.
    [4] Han J., Pei J., Yan X. From Sequential Pattern Mining to Structure Pattern Mining. Journal of Computer Science and Technology, 2004(19): 257~279
    [5] Holder L. B., Yan X. Report on the First International Workshop on Mining Graphs and Complex Structures. In: Proceedings of the 1st International Workshop on Mining Graphs and Complex Structures. Omaha, Nebraska, USA. 2007. Washington, D.C.: IEEE Computer Society, 2007. 53~56
    [6] Washio T., Motoda H. State of the Art of Graph-based Data Mining. ACM SIGKDD Explorations Newsletter, 2003(5): 59~68
    [7] Vanetik N., Gudes E., Shimony S. E. Computing Frequent Graph Patterns from Semistructured Data. Proceedings of the 2002 IEEE International Conference on Data Mining. Maebashi City, Japan. 2002. Washington, D.C.: IEEE Computer Society, 2002. 458~465
    [8] Wang K., Liu H. Discovering Typical Structures of Documents: A Road Map Approach. Proceedings of the 21st annual international ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, 1998. New York: ACM Press, 1998. 146~154
    [9] Zantema H., Wagemans S., Bosnacki D. Finding Freqent Subgraphs in Biological Networks Via Maximal Item Sets. In: M. Elloumi, J. Kung, M. Linial, et al eds.. Proceedings of Bioinformatics Research and Development 2nd International Conference. Venna, Austria. 2008. Berlin: Springer-Verlag, 2008. 303~317
    [10] Helgee E. A. Improving Drug Discovery Decision Making using Machine Learningand Graph Theory in QSAR Modeling: Doctor Thesis. Gothenburg, Sweden: Department of Chemistry, University of Gothenburg, 2010.
    [11] Gasteiger J., Engel T.化学信息学教程.北京:化学工业出版社, 2005. 320~377.
    [12] You C. H., Holder L. B., Cook D. J. Graph-based Data Mining in Dynamic Networks: Empirical Comparison of Compression-based and Frequency-based Subgraph Mining. In: Proceedings of the 2008 IEEE International Conference on Data Mining. Pisa, Italy. 2008. Washington, D.C.: IEEE Computer Society, 2008. 219~226
    [13] Kuramochi M., Karypis G. Discovering Frequent Geometric Subgraphs. Information Systems, 2007,32: 1101~1120
    [14] Nowozin S., Tsuda K. Frequent Subgraph Retrieval in Geometric Graph Databases. Technical Report No. 180, Max Planck Institute for Biological Cybernetics, Tubingen, Germany, 2008. 1~13
    [15] Han J., Kamber M. Data Mining Concepts and Techniques (Second Edition). San Francisco, CA, US: Morgan Kaufmann Publishers, 2006. 535~590.
    [16] Bordino I., Donato D., Gionis A. Mining Large Networks with Subgraph Counting. Proceedings of the 2008 IEEE International Conference on Data Mining. Pisa, Italy. 2008. Washington, D.C.: IEEE Computer Society, 2008. 737~742
    [17] Lahiri M., Berger-Wolf T. Y. Mining Periodic Behavior in Dynamic Social Networks. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence. Hong Kong, China. 2006. Washington, D.C.: IEEE Computer Society, 2006. 52~58
    [18] Zaki M. J. Fast Vertical Mining Using Diffsets. In: L. Getoor, T. E. Senator, P. Domingos, et al eds.. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC. USA. 2003. New York: ACM Press, 2003. 326~335
    [19] Thoma M., Cheng H., Gretton A., et al. Near-optimal supervised feature selection among frequent subgraphs. Proceedings of the 2009 SIAM Conference on Data Mining. Sparks, Nevada, USA. 2009. Philadelphia: SIAM, 2009. 1075–1086
    [20] Cormen T. H., Leiserson C. E., Rivest D. L., et al.. Introduction to Algorithms.Cambridge, MA, USA: The MIT Press, 2001.
    [21]潘金贵,顾铁成,曾俭,滕远方.现代计算机常用数据结构和算法.南京:南京大学出版社, 1994. 319~342.
    [22] Messmer B. T., Bunke H. Subgraph Isomorphism in Polynomial Time. Technical Report, IAM 95-003, Institute of Computer Science and Applied Mathematics, Unversity of Bern, Bern, Switzerland, 1995. 1~33
    [23] Fortin S. The Graph Isomorphism Problem. Technical Report TR 96-20, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, 1996. 1~25
    [24] Luccio F., Enriquez A. M., Rieumont P. O., et al. Button-up Subtree Isomorphism for Unorderd Labeled Trees. Technical Report TR-04-13, Department of Information, University of Pisa, Pisa, Italy, 2004. 1~20
    [25] Luccio F., Enriquez A. M., Rieumount P. O., et al. Exact Rooted Subtree Matching in Sublinear Time. Techical Report TR-01-14, Department of Informatica, University Pisa, Pisa, Italy, 2001. 1~10
    [26] Chi Y., Nijssen S., Muntz R. R., et al. Frequent Subtree Mining an Overview. Fundamenta Informaticae Special Issue on Graph and Tree Mining, 2005(16): 1001~1037
    [27] Asai T., Arimura H., Uno T., et al. Discovering Frequent Substructures in Large Unordered Trees. TECHNICAL REPORT DOI-TR-CS 216, Department of Infomatics, Kyushu University, Fukuoka, Japan, 2003. 1~15
    [28] Nijssen S., Kok J. N. Efficient Discovery of Frequent Unordered Trees. In: N. Lavrac, D. Gamberger, H. Blockeel, et al eds.. Proceedings of the 1st International Workshop on Mining Graphs, Trees and Sequences. Cavtat-Dubrovnik, Croatia. 2003. Berlin: Springer-Verlag, 2003. 55~64
    [29] Chi Y., Yang Y., Muntz R. HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Form. Technical Report NCSC-005, Department of Computer Science, University of Califormia, Los Angeles, CA, USA, 1996: 1~10
    [30] Xiao Y., Yao J., Li Z., et al. Efficient Data Mining for Maximal Frequent Subtrees.Proceedings of the 2003 IEEE International Conference on Data Mining. Melbourne, Florida, USA. 2003. Washington, D.C.: IEEE Computer Society, 2003. 379~386
    [31] Han J., Pei J., Yin Y. Mining Frequent Patterns without Candidate Generation. Data Mining and Knowledge Discovery, 2004, 8: 53~87
    [32] Ruckert U., Kramer S. Frequent Free Tree Discovery in Graph Data. In: H. Haddad, A. Omicini, R. L. Wainwright, et al eds.. Proceedings of the 2004 ACM Symposium on Applied Computing. Nicosia, Cyprus. 2004. New York: ACM Press, 2004. 564~570
    [33] Nijssen S., Kok J. N. A quickstart in frequent structure mining can make a difference. In: W. Kim, R. Kohavi, J. Gehrke, et al eds.. Porceedings of the ACM SIGKDD international conference on Knowledge discovery and datamining. Seattle, WA, USA. 2004. New York: ACM Press, 2004. 647~652
    [34] Inokuchi A., Washio T., Motoda H. A General Framework for Mining Frequent Subgraphs from Labeled Graphs. Fundamenta Informaticae, 2005,(66): 53~82
    [35] Kukluk J. P., Holder L. B., Cook D. J. Inferring Graph Grammars by Detecting Overlap in Frequent Subgraphs. International Journal of Applied Mathematics and Computer Science, 2008(18): 241~250
    [36] Nguyen S. N., Orlowska M. E., Li X. Graph Mining based on a Data Partitioning Approach. In: A. Fekete, X. Lin eds.. Proceedings of the 19th Conference on Australasian Database. Gold Coast, Australia. 2008. New York: ACM Press, 2008. 31~37
    [37] Inokuchi A., Washio T., Motoda H. An Apriori-based Algorithm for Mining Frequent Substructure from Graph Data. In: D. A. Zighed, H. J. Komorowski, J. M. Zytkow eds.. Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. Lyon, France. 2000. Berlin: Springer-Verlag, 2000. 13~23
    [38] Kuramochi M., Karypis G. Frequent Subgraph Discovery. Proceedings of the 2001 IEEE International Conference on Data Mining. San Jose, California, USA. 2001. Washington, D.C.: IEEE Computer Society, 2001. 313~327
    [39] Inokuchi A., Washio T., Nishimura K., et al. A Fast Algorithm for Mining Frequent Connected Subgraphs. Research Report RT0448, IBM Tokyo Research Laboratory,Tokyo, Japan, 2002. 1~11
    [40] Yan X., Han J. gSpan: Graph-Based Substructure Pattern Mining. Proceedings of the 2002 IEEE International Conference on Data Mining. Maebashi City, Japan. 2002. Washington, D.C.: IEEE Computer Society, 2002. 721~725
    [41] Yan X., Han J. gSpan: Graph-Based Substructure Pattern Mining. Technical Report, Department of Computer Science, University of Illunois at Urbana-Champaign, 2002. 1~26
    [42] Huan J., Wang W., Prins J. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Proceedings of the 2003 IEEE International Conference on Data Mining. Melbourne, Florida, USA. 2003. Washington, D.C.: IEEE Computer Society, 2003. 549~553
    [43] McKay B. D. Practical Graph Isomorphism. Congressus Numerantium, 1981(30): 45~87
    [44] Chen Q., Lim A., Ong K. W. D(K)-Index: An Adaptive Structural Summary for Graph-Structured Data. In: A. Y. Halevy, Z. G. Ives, A. Doan eds.. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. San Diego, California, USA. 2003. New York: ACM Press, 2003. 134~144
    [45] Shasha D., Wang J. T. L., Giugno R.. Algorithmics and Applications of Tree and Graph Searching. In: M. J. Franklin, B. Moon, A. Ailamaki eds.. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. Madison, Wisconsin, 2002. New York: ACM Press, 2002. 39~52
    [46] Yan X., Yu P. S., Han J. Graph Indexing: A Frequent Structure-based Approach. In: G. Weikum, A. C. Konia, S. Debloch eds.. Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. Paris, France. 2004. New York: ACM Press, 2004. 335~346
    [47] Yan X., Yu P. S., Han J. Substructure Similarity Search in Graph Databases. In: F. Ozcan eds.. Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. Baltimore, Maryland, USA. 2005. New York: ACM Press, 2005: 766~777
    [48] Conte D., Foggia P., Sansone C., M. Vento. Thirty Years of Graph Matching inPattern Recognition. International Journal of Pattern Recognition and Artificial Intelligence, 2004(18): 265~298
    [49] Babal L., Erdos P., Stanley M. Selkow. Random Graph Isomorphism. SIAM Journal of Computing, 1980(9): 628~635
    [50]王艳辉,吴斌,王柏.频繁子图挖掘算法综述.计算机科学, 2005,(32): 193~197
    [51]洪帆.离散数学基础.武汉:华中科技大学出版社, 1994. 230~267.
    [52] Alonso A. G., Pagola J. E. M., Ochoa J. A. C., et al. Mining Frequent Connected Subgraphs Reducing the Number of Candidates. In: W. Daelemans, B. GoeThals, K. Morik eds.. Proceedings of the 12th European Conference on Principles of Data Mining and Knowledge Discovery. Antwerp, Belgium. 2008. Berlin: Springer-Verlag, 2008. 365~376
    [53] Vijayalakshmi R., Nadarajan R., Nirmala P., et al. A Novel Approach for Detection and Elimination of Automorphic Graphs in Graph Databases. International Journal of Open Problems in Computer Science and Mathematics, 2010,(3): 56~72
    [54] Ullmann J. R. An Algorithm for Subgraph Isomorphism. Journal of the ACM, 1976(23): 31~42
    [55] Zou L., Chen L., Yu J. X., et al. A Novel Spectral Coding in a Large Graph Database. In: A. Kemper, P. Valduriez, N. Mouaddib, et al eds.. Proceedings of the 11th international conference on Extending database technology: Advances in database technology. Nantes, France. 2008. New York: ACM Press, 2008. 181~192
    [56] Borgelt C. Canonical Forms for Frequent Graph Mining. In: S. Nijssen, T. Meinl, G. Karyis eds.. Proceedings of the 3rd International Workshop on Mining Graphs, Trees and Sequences. Porto, Portugal, 2005. Berlin: Springer-Verlag, 2005. 1~12
    [57] Wu J., Chen L. Mining Frequent Subgraph by Incidence Matrix Normalization. Journal of Computers, 2008(3): 109~116
    [58] McKay B. D. Nauty User’s Guide. Department of Computer Science, Australia National University, Canberra, Australia. 2009.
    [59] Ramon J., Nijessen S. Polynomial-Delay Enumeration of Monotonic Graph Classes. The Journal of Machine Learning Research, 2009(10): 907~929
    [60] Agrawal R., Srikant R. Fast Algorithms for Mining Association Rules. ResearchReport RJ9839, IBM Almaden Research Center, San Jose, Ca, 1994. 1~32
    [61] Han J., Kamber M.数据挖掘概念与技术.北京:机械工业出版社. 2004.
    [62] Goldman R., Widom J. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: M. Jarke, M. J. Garey, K. R. Dittrich et al eds.. Proceedings of 23rd International Conference on Very Large Data Bases. Athens, Greece. 1997. San Francisco: Morgan Kaufmann, 1997. 436~445
    [63] Yamamoto T., Ozaki T., Okawa T. Discovery of Frequent Graph Patterns that Consist of the Vertices with the Complex Structures. In: J. N. Kok, J. Koronacki, R. L. Montaras et al eds.. Proceedings of the 3rd ECML/PKDD International Conference on Mining Complex Data. Warsaw, Poland. 2007. New York: ACM Press, 2007. 143~156
    [64] Aho, A. V., Hopcroft, J. E., Ullman, et al: The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974. 77~89.
    [65] Zhao P., Yu J. X. Fast Frequent Free Tree Mining in Graph Databases. Proceedings of the 2nd International Workshop on Mining Complex Data. Hong Kong, China. 2006. Washington, D.C.: IEEE Computer Society, 2006. 71~92
    [66] Nijssen S., Kok J. N. A quickstart in frequent structure mining can make a difference. Technical Report NCSC005, Hinterzarten, Germany, 2004. 1~11
    [67] Nakano S., Uno T. A Simple Constant Time Enumeration Algorithm for Free Trees. PSJ SIGNotes Algorithms. 2003,(26): 256~263
    [68] Kuramochi M., Karypis G. An Efficient Algorithm for Discovering Frequent Subgraphs. Technical Report 02-026, Department of Computer Science/Army HPC Research Center, University of Minnesota, Minneapolis, MN, USA, 2002. 1~27
    [69] Sedgewick R. C算法(第二卷图算法).北京:人民邮电出版社, 2004. 155~178.
    [70] Weiss M. A.数据结构与算法分析——C语言描述.北京:机械工业出版社. 2004. 55~75.
    [71] Srinivasan A., King R. D., Muggleton S. H., et al. The Predictive Toxicology Evaluation Challenge. In: A. L. Ralescu, J. G. Shanahan eds.. The Proceedings of the 15th International Joint Conference on Aritificial Intelligence. Nagoya, Japan. 1997. Berlin: Springer-Verlag, 1997: 4~9
    [72] Hu X., Chiueh T., Shin K. G. Large-Scale Malware Indexing Using Function-Call Graphs. In: E. A. Shaer, S. Jha, A. D. Keromytis eds.. Proceedings of the 16th ACM Conference on Computer and Communications Security. Chicago, Illinois, USA. 2009. New York: ACM Press, 2009. 611~620
    [73] Berretti S., Bimbo A. D., Vicario E. Efficient Matching and Indexing of Graph Models in Content-Based Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001(23): 1089~1105
    [74] Chen C., Yan X., Yu P. S., et al. Towards Graph Containment Search and Indexing. Proceedings of the 33rd International Conference on Very Large Databases. Lyon, France. 2009. New York: ACM Press, 2009. 926~937
    [75] Chung C. W., Min J. K., Shim K. APEX: An Adaptive Path Index for XML data. In: M. J. Franklin, B. Moon, A. Ailamaki eds.. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. Madison, Wisconsin, 2002. New York: ACM Press, 2002. 121~132
    [76] Shokoufandeh A., Dickinson S. J., Siddiqi K., et al. Indexing using a Spectral Encoding of Topological Structure. Proceedings of 1999 Conference on Computer Vision and Pattern Recognition. Ft. Collins, CO, USA. Washington, D.C.: IEEE Computer Society, 1999. 2491~2497
    [77]黄崇本,陶剑文,程光化.一种面向图包容搜索的图索引模型.计算机应用, 2008(28): 479~483
    [78] Cooper B. F., Sample N., Franklin M. J., et al. A Fast Index for Semistructured Data. In: P. M. Apers, P. Atzeni, S. Ceri, et al eds.. Proceedings of 27th International Conference on Very Large Data Bases. Rome, Italy. 2001. San Francisco: Morgan Kaufmann, 2001. 341~350
    [79] Tribl S., Leser U. Fast and Practical Indexing and Querying of Very Large Graphs. In: C. Y. Chan, B. C. Ooi, A. Zhou eds.. Proceedings of the 2007 ACM SIGMOD international conference on Management of Data. Beijing, China. 2007. New York: ACM Press, 2007. 845~856
    [80] Zhao P., Yu J. X., Yu P. S. Graph Indexing: Tree + Delta <= Graph. In: C. Koch, J. Gehrke, M. N. Garofalakis, et al eds.. Proceedings of the 33rd InternationalConference on Very large Databases. Vienna, Austria. 2007. New York: ACM Press, 2007. 938~949
    [81] He H., Singh A. K. Graphs-at-a-time: Query Language and Access Methods for Graph Databases. in J. Tsong, L. Wang eds.. Prceedings of the ACM SIGMOD International Conference on Management of Data. Vancouver, BC, Canada. 2008. New York: ACM Press, 2008. 405~418
    [82] Cheng J., Ke Y., Ng W., et al. SG-Index: Towards Verification-Free Query Processing on Graph Databases. In: C. Y. Chan, B. C. Ooi, A. Zhou eds.. Proceedings of the 2007 ACM SIGMOD international conference on Management of data. Beijing, China. 2007. New York: ACM Press, 2007. 857~872
    [83] Natale R. D., Ferro A., Giugno R., et al. SING: Subgraph search In Non-homogeneous Graphs. BMC Bioinformatics, 2010(11): 96~135
    [84] He H., Singh A. K. Closure-Tree: An Index Structure for Graph Queries. In: L. Liu, A. Reuter, K. Y. Whang eds.. Proceedings of the 22nd International Conference on Data Engineering. Atlanta, GA, USA. 2006. Washington, D.C.: IEEE Computer Society, 2006. 38~50
    [85] Suel T., Yuan J. Compressing the Graph Structure of the Web. Proceedings of the IEEE Data Compression Conference. Snowbird, Utah, USA. 2001. March 2001. Washington, D.C.: IEEE Computer Society, 2001. 213~222
    [86] Fan W., Zhang K., Cheng H., et al. Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree. In: Y. Li, B. Liu, S. Sarawagi eds.. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Las Vegas, Neveda, USA. 2008. New York: ACM Press, 2008. 230~238
    [87] Cheng H., Lo D., Zhou Y. Identifying Bug Signatures Using Discriminative Graph Mining. In: G. Rothermel, L. K. Dillon eds.. Proceedings of the 18th International Symposium on Software Testing and Analysis. Chicago, IL, USA. 2009. New York: ACM Press, 2009. 141~152
    [88] Giugno R., Shasha D. GraphGrep: A Fast and Universal Method for Quering Graphs. In: Proceedings of the 16th International Conference on Pattern Recognition.Quebec, Canada. 2002. Washington, D.C.: IEEE Computer Society, 2002. 219~226
    [89] Zaki M. J. Efficiently Mining Frequent Trees in a Forest. In: Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada. 2002. New York: ACM Press, 2002. 71~80
    [90] Horvath T., Ramon J., Wrobel S. Frequent Subgraph Mining in Outerplanar Graphs. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA. New York: ACM Press, 2006. 197~206
    [91] Zhu F., Yan X., Han J., et al. gPrune: A Constraint Pushing Framework for Graph Pattern Mining. In: Z. Zhou, H. Li, Q. Yang eds.. Proceedings of the 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Nanjing, China. 2007. Berlin: Springer-Verlag, 2007. 388~400
    [92] Ke Y., Cheng J., Yu J. X. Top-k Correlative Graph Mining. Proceedings of the 2009 SIAM Conference on Data Mining. Sparks, Nevada, USA. 2009. Philadelphia: SIAM, 2009. 1038–1049
    [93] Shinoda M., Ozaki T., Ohkawa T. Weighted Frequent Subgraph Mining in Weighted Graph Databases. Proceedings of the 25th International Conference on Data Engineering. Miami, Florida, USA. 2009. Washington, D.C.: IEEE Computer Society, 2009. 58~63
    [94] Yan X., Han J. CloseGraph Mining Closed Frequent Graph Patterns. In: L. Getoor, T. E. Senator, P. Domingos, et al eds.. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC. USA. 2003. New York: ACM Press, 2003. 286~295
    [95] Faloutsos C., McCurley K. S., Tomkins A. Fast Discovery of Connection Subgraphs. In: W. Kim, R. Kohavi, J. Gehrke, et al eds.. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, WA, USA. 2004. New York: ACM Press, 2007. 118~127
    [96] Yan X., Cheng H., Han J., et al. Mining Significant Graph Patterns by Leap Search. In J. Tsong, L. Wang eds.. Prceedings of the ACM SIGMOD International Conference on Management of Data. Vancouver, BC, Canada. 2008. New York:ACM Press, 2008. 433~444
    [97] Zeng Z., Wang J., Zhou L. Efficient Mining of Minimal Distinguishing Subgraph Patterns from Graph Databases. In: M. J. Zaki, J. X. Yu, B. Ravindran, et al eds.. Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining. Osaka, Japan. 2008. New York: ACM Press, 2008. 1062~1068
    [98] Wang X., Smalter A., Huan J., et al. G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases. In: M. L. Kersten, B. Novikov, J. Teubner, et al eds.. Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. Saint Petersburg, Russia. 2009. New York: ACM Press, 2009. 472~480
    [99] Tian Y., Patel J. M. TALE: A Tool for Approximate Large Graph Matching. Proceedings of the 24th International Conference on Data Engineering. Cancun, Mexico. 2008. Washington, D.C.: IEEE Computer Society, 2008. 963~972
    [100] Tribl S., Leser U. Querying Ontologies in Relational Databases Systems. In: B. Ludascher, L. Raschid eds.. Proceedings of the 2nd International Workshop on Data Integration in the Life Sciences. San Diego, CA, USA. 2005. Berlin: Spinger--Verlag, 2005. 63~67

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700