用户名: 密码: 验证码:
多模型下的近似字符串匹配算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近似字符串匹配是模式匹配研究领域中的一个重要问题。近年来,随着各学科的迅速发展,在许多不同背景下对于近似串匹配问题的研究逐渐受到人们的关注特别是在计算生物学等新兴学科中,有许多近似串匹配的新模型被陆续提出。一方而,这些模型在各学科中均有非常重要的应开用;另一方而,由于这些模型被提出的时问大多不长,因此对它们的研究并不充分。因此,对不同模型下的近似串匹配问题进行研究就成为了在模式匹配和诸多相关领域中的研究重点和难点。针对这一现状,研究的主要内容就是三种近年被提出的模型下的近似串匹配问题。这三种模型分别是:属性匹配模型,交换匹配模型以及块匹配模型。
     针对属性匹配,提出了一种新的索引结构CIDS-PIP (compressed indexed data structure for property indexing problem)及相应的匹配算法。在该算法中采用了压缩后缀数组作为核心的索引结构。为了进一步降低索引的空间开销,针对不同的属性规模,提出了两种解决方案。针对这两种方案设计了不同的辅助索引结构,以同时满足较高的查询效率和较低的空间开销。与现有的支持属性匹配的索引相比,CIDS-PIP的空间开销更低。
     针对交换匹配,提出了一种新的离线匹配算法,并证明更精确的模式交换版本的个数上限。该交换匹配离线算法是建立在已有的全文索引的基础上,而非设计全新的索引数据结构。在该算法中,采用后缀数组或压缩后缀数组作为索引结构。当模式长度较小时,该算法可以达到良好的时间效率,并明显优于现有的属性匹配在线算法。此外,还解决了近似交换匹配问题。实验证明,相对于已有的在线交换匹配算法,该算法的时间效率大幅提高。
     块模式匹配模型是Julio N等人在2011年提出的一种匹配模型,现在在此基础上进行的研究并不多见。针对在线和离线两种情况下的块模式匹配问题进行了研究,并且分别给出了新的算法。相对于现有的在线匹配算法,新的在线算法需要更低的空间开销,而时间开销并没有增加。相开对于现有的离线匹配算法,新的离线算法的时间效率更高。与此同时,还指出了在Julio等人论文中的一处不正确的表述,并对其算法的时间复杂度进行了修正。此外,对块模式匹配问题的两个衍生问题:融合块模式匹配问题和突变块模式匹配问题进行了研究并提出了新的算法。
Approximate string matching is an important issue in the research areas of pattern matching. With the rapid development of various disciplines, the research of approximate string matching in many different areas gradually attracts lots of attentions. Particularly, in the emerging disciplines such as computational biology, many models for approximate string matching are proposed. On one hand, these models can be applied widely in many areas. On the other hand, the research on pattern matching under these models is not enough, since most of these models have been proposed for not long time. In view of this situation, we studied the approximate pattern matching problems under three different models, which are property pattern matching, swap matching and blocked pattern matching.
     For the property matching problem, an off-line matching algorithm is proposed. In that algorithm, the indexes based on compressed suffix arrays are used as the care structure. In order to reduce more space overhead, we gave two solutions for different sizes of the property. In these two solutions, different auxiliary structures are designed in order to meet the higher query efficiency and lower space overhead. Compared with other existing index supporting the property matching, the space overhead of our solution is much smaller. As return, the time complexity of the matching is a little higher.
     For the swap matching problem, an off-line matching algorithm is proposed, and it is the first off-line swap matching algorithm. At the same time, a more accurate upper bound of the number of swap versions of a pattern is given. In this solution, the existing full text index is used as our index instead of designing new ones. Our solution is more effective than the existing on-line algorithms when the pattern is short.
     The Blocked Pattern Matching (BPM) Problem is studied. Blocked pattern matching problem is proposed by Julio N. et.al in2011. Now there is little research on this problem. For the blocked pattern matching problem and other related problems, a group of algorithms which improved the time efficiency or space efficiency separately are given. An incorrect statement in Julio's paper is found, and a solution for the problem which needs less searching time is proposed. The mutated blocked pattern matching problem and fused blocked pattern matching problem are studied, and new algorithms are given.
引文
[1]Navarro G.. A guided tour to approximate string matching. ACM computer survey, 2001,33:31-88
    [2]Levenshtein V. I.. Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission,1965,1:8-17
    [3]Navarro, G.. Approximate Text Searching. PhD thesis, Department of Computer Science, University of Chile,1998
    [4]Luque G, Alba E. Metaheuristics for the DNA Fragment Assembly Problem. International Journal of Computational Intelligence Research 2005,1(2):98-108
    [5]Fiasconaro A., Fernando F.. Dynamical model for the full stretching curve of DNA. Physical Review E,2012,86,032902:1-4
    [6]Hood L., Heath J. R., Phelps. E, et.aL Systems Biology and New Technologies Enable Predictive and Preventative Medicine, Science Signalling,2004,306 (5696): 640-643
    [7]Fridman, W. H., Pages, F., Sautes-Fridman, C., & Galon, J. The immune contexture in human tumours:impact on clinical outcome. Nature Reviews Cancer,2012,12(4): 298-306
    [8]Laferriere A, Gautheret D, Cedergren R. An RNA pattern matching program with enhanced performance and portability. Computer Applications in Biosciences,1994, 10(2):211-212
    [9]Andreotti, Sandro, Gunnar W. Klau, and Knut Reinert. "Antilope—A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem" IEEE/ACM Transactions on Computational Biology and Bioinformatics,2012,9(2):385-394
    [10]Gupta N., Pevzner P. A. False discovery rates of protein identifications:A strike against the two-peptide rule. Journal of Proteome Research,2009,8(9):4173-4181
    [11]Tago S., Asai T, Katoh T, Morikawa, H et.al. EVIS:a fast and scalable episode matching engine for massively parallel data streams. In Database Systems for Advanced Applications, Berlin, Springer,2012:213-223
    [12]Lewi G. K., Fouts, T. R., Ibrahim, S., et. al. "Identification and Characterization of an Immunogenic Hybrid Epitope Formed by both HIV gp120 and Human CD4 Proteins." Journal of Virology,2012,86(9):5410
    [13]Pei. J, Grishin. N.V.. Combining evolutionary and structural information for local protein structure prediction. Proteins 2004,56(4):782-794
    [14]San-Segundo, R, Carlos D. Martinez H, et.al. Review Of Research On Speech Technology:Main Contributions From Spanish Research Groups.Journal of Speech Sciences,2012,1(1):31-53
    [15]Atallah M. J.. Handbook of Algorithms and Theory of Computation. Boca Raton: CRC Press,1998,326-373
    [16]Mousavi, S. R., Tabataba, F. An improved algorithm for the longest common subsequence problem. Computers & Operations Research,2012:39(3),512-520
    [17]Albagli, Sivan, Rachel Ben-Eliyahu-Zohary, and Solomon E. Shimony. "Markov network based ontology matching." Journal of Computer and System Sciences 78.1 (2012): 105-118
    [18]Albagli, S., Ben-Eliyahu-Zohary, R., Shimony, S. E.. Markov network based ontology matching. Journal of Computer and System Sciences,2012,78(1):105-118
    [19]Masters, H. A study of spelling errors. University of Iowa Studies in Education.1927, 4:4
    [20]Damerau, F.. A technique for computer detection and correction of spelling errors. Communications of the. ACM.1964,7(3):171-176
    [21]Kukich K.. Techniques for automatically correcting words in text. ACM Computer. Surveys.1992,24(4):377-439
    [22]Nadkarni P., Chapman W., Ohno-Machado L.. Natural language processing:an introduction. Journal of The American Medical Informatics Association.2011,18: 544-551
    [23]Himanshu S., Neeraj. K, Govind K, et.al. A Natural Language Interface Based on Machine Learning Approach. Communications in Computer and Information Science, 2011,197(3):549-557
    [24]Matsushita, T., Cheng, C., Murata, Y., Zhu, B.,& Nakagawa, M. Effect of Text/Non-text Classification for Ink Search Employing String Recognition. In 10th IAPR International Workshop on Document Analysis Systems (DAS), IEEE Press,2012: 230-234
    [25]Califf M. E., Mooney R. Relational learning of pattern-match rules for information extraction Proceedings of the Sixteenth National Conference on Artificial intelligence. AⅢ Press,2007:9-15
    [26]Clark M., Kim Y., Kruschwitz U., et. al. Automatically structuring domain knowledge from text:An overview of current research. Information Processing & Management,2012,48(3):552-568
    [27]Conte D., Foggia P., Sansone C. Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence.2004, 18(3)265-298
    [28]Ouyang, W., Tombari, F., Mattoccia, S., et.al. Performance evaluation of full search equivalent pattern matching algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(1):127-143
    [29]Hung X. H, Lin C. H, Hong J.H. Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling. Expert Systems with Applications.2010,31(1):341-347
    [30]Ding, J., Hu, H., Li, X. Thousands of cis-regulatory sequence combinations are shared byArabidopsis and poplar. Plant physiology,2012,158(1):145-155
    [31]Lewin B. Genes of SMA:Multum in parvo. Cell. 1995,80:1-5
    [32]Julio.N., Amir A., Pevzner. P. A. Blocked Pattern Matching Problem and Its Applications in Proteomics. In Proceedings of 5th Annual International Conference of Research in Computational Molecular Biology. Berlin Heidelberg:Springer,2011: 298-319
    [33]Sellers. P. H.. The theory and computation of evolutionary distances:Pattern recognition. Journal of Algorithms.1980,1(4):359-373
    [34]Ukkonen. E. Finding approximate patterns in strings. Journal of Algorithms,1985, 6(1-3):132-137
    [35]Chang. W. I, Lampe. J. Theoretical and empirical comparisons of approximate string matching algorithms. In Proceedings of the 2nd Annual Symposium on Combinatorial Pattern Matching. Berlin, Heidelberg:Springer,1992:195-184
    [36]Baeza-Yates. R. Some new results on approximate string matching. In Proceedings of the 2nd Annual Symposium on Combinatorial Pattern Matching. Berlin, Heidelberg: Springer,1992:195-184
    [37]Wu S. and Manber. U. Fast text searching alowing errors. Communications of the ACM,1992,35(10):83-91
    [38]Bareza-YatesR.A. and Navarro G.. Faster approximate string matching. Algorithmica, 1999,23(2):127-158
    [39]Navarro. G, Bareza-Yates. R.A. Very fast and simple approximate string matching. Information Processing Letters,1999,72:65-70
    [40]Navarro. G, Raffinot M. Fast regular expression search. In Proceedings of the 3rd Workshop on algorithm Engineering. Springer:1999,1668:199-213
    [41]Navarro G. and Raffinot M.. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics,5(4), 2000:1-36
    [42]Kececioglu J. and Sankoff D.. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica,1995,13(1):180-210
    [43]Lopresti D. P. and Tomkins A.. Block edit models for approximate string matching. Theoretical Computer Science.1997,181(1):159-179
    [44]Kim D. K., Lee J. S. Park J. S, et.al. Efficient algorithms for approximate string matching with swaps. Journal of Complexity.1999,15:128-147
    [45]Cobbs. A.. Fast Approximate Matching using Suffix Trees. In Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching. Springer:1995,807,41-54
    [46]Cole. R, Gottliev. L.A., Lewenstein. M. Dictionary Matching and Indexing with Errors and Don't Cares. In Proceedings of the 36th annual ACM symposium on Theory of computing. New York:ACM Press,2004:91-100
    [47]Huynh, T., Hon, W.K., Lam, T.W. et.al. Approximate string matching using compressed suffix arrays. In Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching, New York:Springer,2006:434-444
    [48]Navarro, G., Baeza-Yates, R., Sutinen, E., et.al. Indexing methods for approximate string matching. IEEE Data Engineering Bulletin.2010,24:19-27
    [49]Amir. A., Chencinski. E., Iliopoulos C.S., et.aL Property matching and weighted matching. Theoretical Computer Science,2008,395:298-310
    [50]Jurka J.. Origin and evolution of alu repetitive elements. The impact of Short Interspersed Elements (SINEs) on the Host Genome. Austin:RG Landes Company, 1995:25-41
    [51]Tisch, D., Kubicek, C. P., Schmoll, M. New insights into the mechanism of light modulated signaling by heterotrimeric G-proteins:ENVOY acts on gnal and< i> gna3 and adjusts cAMP levels in Trichoderma reesei ( Hypocrea jecorina). Fungal Genetics and Biology,2011,48(6):631-640
    [52]Ferragina. P, Manzini. G, Makinen. V, et. Al. Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 2007,3(2),1-24
    [53]Cohen. H, Porat. E:Fast Set Intersection and Two-Patterns Matching. Proceedings of International Conference on Current Trends in Theory and Practice of Computer Science 2008.:316-327
    [54]Iliopoulos C. S. and Rahman. M. S. Faster index for property matching. Information Processing Letters,2008,105:218-223
    [55]Juan M. T, Liu. J. J, Wang. Y. L. Errata for "Taster index for property matching". Information Processing Letters,2009,109:1027-1029
    [56]Sadakan K. Compressed suffix trees with full functionality. Theory of Computing Systems,2007,41(4),589-607
    [57]Grossi R, Vitter. J. S. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing,2005,35(2),378-401
    [58]Ferragina P, Manzini G. Opportunistic data structures with application In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science. IEEE Press,2000390-398
    [59]Jurka. J.. Human repetitive elements. Molecular Biology and Biotechnology. Royal Society of Chemistry,1995:438-441
    [60]Navarro. G, Makinen. V. Compressed full-text indexes. ACM Computing Surveys, 2007,39:1-68
    [61]Chen. G, Puglisi. S. J, Smyth. W. F. LZ factorization using less time and space. Mathematics in Computer Science.2007,1(4):605-623
    [62]Benoit. D, Demaine. E. D, Munro J. I, et. al. Representing trees of higher degree. Algorithmica,2005,43(4):275-292
    [63]Golynski A, Grossi R, Gupta A, et. al. On the size of succinct indices. In Proceedings of 15th Annual European Symposium on Algorithms, New York:Springer,2007:371-382
    [64]Bender M, Farach-Colton M. The LCA Problem Revisited. In 4th Latin American Symposium Proceedings. Berlin Heidelberg:Springer,2000:88-94
    [65]Sadakane K.. New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms.2003,48 (2):294-313
    [66]Sadakane K, and Grossi R., Squeezing succinct data structures into entropy bounds, in:Procs ACM-SIAM SODA,2006:1230-1239
    [67]Sadakane K.. Space-efficient data structures for flexible text retrieval systems. In 13th International Symposium, ISAAC. Berlin Heidelberg:Springer,2002:14-24
    [68]Fischer J. and Heun V. A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array. Combinatorics, Algorithms, Probabilistic And Experimental Methodologies. Lecture Notes in Computer Science,2007,4614: 459-470
    [69]Fischer J., Heun V., and Stuhler. H. M.. Practical entropy bounded schemes for O(l)-range minimum queries. In Proceedings of Data Compression Conference 2008. IEEE Press,2008:272-281
    [70]Fischer, J. Combined data structure for previous-and next-smaller-values. Theoretical Computer Science,2011,412(22):2451-2456
    [71]Lowrance. R., Wagne. R. A., An extension of the string-to-string correction problem, Journal of Associated Computer.1975:Mc22,177-183
    [72]Wagner R. A., On the complexity of the extended string-to-string correction problem. In Proceedings of 7th ACM ACM symposium on Theory of computing. New York, ACM Press,1975:218-223
    [73]Muthukrishnan S. and Ramesh H.. String matching under a general matching relation, Information And Computation.1995,122:140-148
    [74]Muthukrishnan S. and Palem K., Non-standard stringology:Algorithms and complexity. In Proceedings of 26th Annual Symposium on the Theory of Computing, New York, ACM Press,1994:770-779
    [75]Galil. Z, Park K.. An improved algorithm for approximate string matching, SIAM Journal of Computer.1990,19:989-999
    [76]Landau G. M. and Vishkin U.. Fast parallel and serial approximate string matching. Journal of Algorithms.1989,10:157-169
    [77]Landau G. M., Myers E. W., and Schmidt J. P.. Incremental string comparison, SIAM Journal of Computer.1998,27:557-582
    [78]Sahinalp S. C. and Vishkin U.. Efficient approximate and dynamic matching of patterns using a labeling paradigm, in Proceedings of 37th IEEE Foundation of Computer Science. New York:IEEE Computer Society,1996:320-328
    [79]Cole R. and Hariharan R.. Approximate string matching:A faster simpler algorithm, SIAM Journal of Computing.2002:31(6):1761-1782
    [80]Amir A., Lewenstein M., and Porat E.. Faster string matching with k mismatches, In Proceedings of the Eleventh Symposium on Discrete Algorithms. Philadelphia, PA:ACM Press,2000:794-803
    [81]Clifford, R., Efremenko, K., Porat, B., et al. A Mismatch sampling. Information and Computation,2012,214(1):112-118
    [82]Aumann, Y, Lewenstein, M., Lewenstein, N.,et. al. Finding witnesses by peeling. ACM Transactions on Algorithms,2011,7(2):24
    [83]Muthukrishnan S.. New results and open problems related to nonstandard stringology. In Proceedings of 6th Combinatorial Pattern Matching Conference,1995:298-317
    [84]Amir A., Aumann Y, Landau G., et. al. Pattern matching with swaps. In Proceedings of 38th IEEE Foundation of Computer Science. New York:IEEE Press,1997:144-153
    [85]Amir A., Landau G., Lewenstein M., and Lewenstein N.. Alphabet dependence in parameterized matching. Information Processing Letters,1998,68(3):125-132
    [86]Amir A., Cole R., Hariharan R., et. al. Overlap matching. Information and Computation,2003,181(1):57-74
    [87]Iliopoulos C.S. and Rahman M.S.. A new model to solve the swap matching problem and efficient algorithms for short patterns. In Proceedings of SOFSEM 2008, Springer: Lecture Notes in Computer Science,2008,4910:316-327
    [88]Cantone D. and Faro S.. Pattern Matching with Swaps for Short Patterns in Linear Time. In Proceedings of SOFSEM 2009, Springer, LNCS 2009,5404:255-266
    [89]Amir A., Lewenstein M., Porat E.. Approximate swapped matching. Information Processing Letters,2002,83(1):33-39
    [90]Park, C.Y., Klammer, A. A., Kall, L., et. Al. Rapid and accurate peptide identification from tandem mass spectra. Journal of Proteome Research 2008,7(7):3022-3027
    [91]Tanner, S., Shu, H., Frank, A., et.al:Inspect:Identification of posttranslationally modified peptides from tandem mass spectra. Analytical Chemistry.2005,77(14): 4626-4639
    [92]Chan, P. P., Holmes, A. D., Smith, A. M.,et.al. The UCSC archaeal genome browser: 2012 update. Nucleic acids research,2012,40(D1):D646-D652
    [93]Drake, P. M., Schilling, B., Niles, R. K., et.al. Lectin Chromatography/Mass Spectrometry Discovery Workflow Identifies Putative Biomarkers of Aggressive Breast Cancers. Journal of proteome research,2012,11 (4):2508-2520
    [94]Frank, A.M., Pevzner, P.A.:PepNovo:De Novo Peptide Sequencing via Probabilistic Network Modeling. Analytical Chemistry.2005,77:964-973
    [95]Jeong K, Kim S, Bandeira N, Pevzner PA Gapped spectral dictionaries and their applications for database searches of tandem mass spectra. Molecular & Cellular Proteomics,2011:M110.002220
    [96]Sheng, Q., Dai, J., Wu, Y, Tang, H.,et.al. BuildSummary:Using a Group-Based Approach To Improve the Sensitivity of Peptide/Protein Identification in Shotgun Proteomics. Journal of Proteome Research,2012,,11(3):1494-1502
    [97]Kim, S., Gupta, N., Pevzner, P.A.:Spectral probabilities and generating functions of tandem mass spectra:A strike against decoy databases. Journal of Proteome Research. 2008,7(8):3354-3363
    [98]Kim, S., Gupta, N., Bandeira, N., Pevzner, P.A.. Spectral dictionaries:Integrating de novo peptide sequencing with database search of tandem mass spectra. Molecular & Cellular Proteomics.2009,8(1):53-69
    [99]Amir, A.. Asynchronous pattern matching. In Proceedings of 17th Combinatorial Pattern Matching Conference. Berlin, Heidelberg:Springer,2006:1-10
    [100]Amir, A., Aumann, Y., Benson, G., et. Al. Pattern matching with address errors: rearrangement distances. In Procedings of 17th ACM-SIAM Symposium on Discrete Algorithms (SODA), New York:ACM Press,2006:1221-1229
    [101]Linhart C., Shamir R.. Matching with don't-cares and a small number of mismatches. Information processing letters.2009:109(5):273-277
    [102]Elenitoba-Johnson, K.S.J., Crockett, D.K., Schumacher, J.A., et. al:Proteomic identification of oncogenic chromosomal translocation partners encoding chimeric anaplastic lymphoma kinase fusion proteins. Proceedings of the National Academy of Sciences.2006,103(19):7402-7407
    [103]Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.:Adaptive set intersections, unions, and differences. In:Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms. Philadelphia, PA:Society for Industrial and Applied Mathematics 2000:743-75
    [104]Baeza-Yates, R.A.:A fast set intersection algorithm for sorted sequences. In Proceedings of 15th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, Springer,2004:3109:400-408
    [105]Barbay, J., Lopez-Ortiz, R., Lu, T.:Faster adaptive set intersections for text searching. In Experimental Algorithms:5th International Workshop, WEA 2006, Lecture Notes in Computer Science 4007:146-157
    [106]Bille, P., Pagh, A., Pagh, R.:Fast evaluation of union-intersection expressions. In Biennial Conference of the International Society for augmentative and Alternative Communication., Berlin, Heidegger:Springer,2007:739-750
    [107]Cohen H., Porat E.:Fast set intersection and two-patterns matching. In Proceedings of In 14th Latin American Symposium Proceedings. Berlin, Heidegger: Springer,2010:234-242
    [108]Ferragina P., Gonzalez R., Navarro G. Compressed text indexes:From theory to practice. Journal of Experimental Algorithmics:2009,1(12):1-31
    [109]Makinen, V.. Compact suffix array. In Proceedings of 11th Annual Symposium on Combinatorial Pattern Matching (CPM). Berlin, Heidegger:Springer,2000:305-319
    [110]Grossi, R., Gupta, A., and Vitter, J.. High-order entropy-compressed text indexes. In Proc.14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA),2003,11, 841-850
    [111]Lee S. and Park K.. Dynamic rank-select structures with applications to run-length encoded texts. In Proceedings of 18th Annual Symposium on Combinatorial Pattern Matching (CPM). Berlin, Heidegger:Springer,2000:95-106
    [112]Gonzalez, R., Navarro, G.:In Proceedings of In 12th Latin American Symposium Proceedings. Berlin, Heidegger:Springer,2010:374-386
    [113]Golynski A., Munro J. I., and Rao S. S.. Rank/select operations on large alphabets:a tool for text indexing. Proceedings of the 17th ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, PA:ACM Press,2006:368-373
    [114]Gonzalez R., Navarro G.. Statistical encoding of succinct data structures, In Proceedings of 16th Annual Symposium on Combinatorial Pattern Matching (CPM). Berlin, Heidegger:Springer,2006:295-306
    [115]Sadakane K., Grossi R.. Squeezing succinct data structures into entropy bounds. In Proceedings of the 17th ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, PA:ACM Press,2006:1230-1239
    [116]Burrows M. and Wheeler D.. A block sorting lossless data compression algorithm. Digital Equipment Corporation Technical Report 124,1994:1-17
    [117]Ferragina P., Venturini R.. A simple storage scheme for strings achieving entropy bounds. Theoretical Computer Science,2007,372 (1):115-121
    [118]Leonard, M., Mouchard, L., Salson, M. On the number of elements to reorder when updating a suffix array. Journal of Discrete Algorithms,2012,11:87-99
    [119]Grossi, R., Gupta, A., and Vitter, J.. When indexing equals compression: Experiments with compressing suffix arrays and applications. In Proceedings of 15th Annual ACM-SIAM Symposium on Discrete Algorithms,2004:636-645
    [120]Ferragina P, Manzin G.. Indexing compressed text. Journal of ACM 2005,52: 552-581
    [121]Peng, Y. H., Yang, C. B., Tseng, C. T., et. al. A new efficient indexing algorithm for one-dimensional real scaled patterns. Journal of Computer and System Sciences,2012, 78(1):273-278
    [122]Barsky, M., Stege, U., Thomo, A. Suffix trees for inputs larger than main memory. Information Systems,2011,36(3):644-654.
    [123]Ferragina P., Manzini G., Makinen V. Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms,2007,3(2):1-24
    [124]Tsang D., Chawla S.. A robust index for regular expression queries. Proceedings of the 20th ACM international conference on Information and knowledge management, New York:ACM Press,2011:2365-2368
    [125]Cho J. and Rajagopalan S.. A fast regular expression indexing engine. In Proceedings 18th International Conference on Data Engineering, Los Alamito:IEEE Computer Society,2002:419-430
    [126]刘燕兵,刘萍,谭建龙等。基于存储优化的多模式串匹配算法。计算机研究与发展,2009,46(10):1768-1776
    [127]刘小珠,彭智勇。全文索引技术时空效率分析。软件学报,2009,20(7):1768-1784
    [128]Ferragina P, Navarro G.. Pizza&Chili Corpus:Compressed Indexes and their Testbeds,2009, http://pizzachili.dcc. uchile.cl/index. html
    [129]Knuth D.E., Morris J. H., Pratt V. R.. Fast pattern matching in strings. SIAM Journal on Computing,1977,6(1):323-350

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700