用户名: 密码: 验证码:
序列标注的在线算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
序列模型就是结构化模型中的一个经典模型,在自然语言处理、计算机视觉、生物信息学等领域得到了广泛的应用。对其模型及算法的研究和改进,具有重大的意义和实用价值。在过去的几年里,研究人员对序列模型的研究取得了一定的成果,但仍然存在很多值得探索的问题,本文就对序列模型的若干问题进行了深入的研究。
     首先,提出了基于标签对分类间隔最大化的方法,克服了传统优化分类间隔最大化方法,倾向于使整个序列间隔最大,而会忽略局部标签正确性的缺点。利用标签对分类间隔最大的准则来代替在线被动主动算法中参数更新过程的优化目标,得到一种新的在线算法。并且通过实验验证该算法相比原算法能够使序列标注的性能得到改进。
     其次,本文还对在线被动主动算法的并行化方法进行了研究,对在线被动主动算法的训练过程进行了修改,提出了并行化在线被动主动算法。并且从理论上证明了并行化的算法拥有同原算法相同的分类累积错误上界,还通过实验验证了并行算法的正确性,同时试验了该算法在分布式平台上能够达到一定的加速比,为利用多核平台的计算能力提供了一套可行有效的解决方案。
     本文研究了用于序列标注的在线主动被动算法相关的两个问题,提出了该模型和算法的改进,通过实验验证了改进后的算法获得了性能上的提升。
Sequence model is a classical structured model, which has been widly ap-plied in many areas such as Nature Language Processing, Computer Vision and Bioinfomatics. In the past several years, researchers have achieved some results on sequence model studying. But there are still many problems to be explored. In this paper, some problems on sequence model are chosen to conduct in-depth study.
     First, we propose labelwise margin maximum method to overcome the short-comings of traditional method, which tends to maximum the margin of whole sequence and ignores the correctness of local labels. By using labelwise margin maximum criteria instead of margin maximum criteria in online passive aggressive algorithm, we get a new online algorithm called labelwise PA. Experiments show the new algorithm gets better performance.
     Secondly, we study the parallel solution of online passive aggressive algo-rithm. We propose parallel PA algorithm by modifying the training process of original online passive aggressive algorithm. We theoretically prove the parallel algorithm has same upper bound of cumulative erros with the original algorithm. We also verify the correctness of the parallel algorithm by experiments, while test-ing the speed ratio of the algorithm on distributed platform. This work provides an effective solution for the use of multi-core computing platform.
     In this paper, we study on two problems related to the online passive ag-gressive algorithm of sequence labeling. We proposed improvements in the model and algorithm. Experiments show that the improved algorithm obtains better performance.
引文
[1]SMINCHISESCU C, KANAUJIA A, LI Z. et al. Conditional models for con-textual human motion recognition[C]//10th IEEE International Conference on Computer Vision. New York, NY, USA: IEEE, 2005, 2:1808-1815.
    [2]MCDONALD R, R,REIRA F. Identifying gene and protein mentious in text using conditional random fields[J]. BMC bioinformatics, 2005, 6(Suppl 1):S6.
    [3]LI M, LIN L, WANG X, et al. Protein-protein interaction site prediction based on conditional random fields[J]. Bioinformatics, 2007, 23(5):597.
    [4]MCCALLUM A, FREITAG D, PEREIRA F C N. Maximum Entropy Markov Models for Information Extraction and Segmentation[C]//Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000:591-598.
    [5]LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional Ran-dora Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the Eighteenth International Conference on Ma-chine Learning. San Francisco, CA, USA: Morgan Kaufinann Publishers Inc. 2001:282-289.
    [6]SUTTON C, ROHANIMANESH K, MCCALLUM A. Dynamic conditional random fields: Factorized probabiliscic models for labeling and segmenting sequence data[C]//Proceedings of the twenty-first iuternational conference on Machine learning. New York, NY, USA: ACM, 2004:99-106.
    [7]SARAWAGI S, COHEN W. Semi-markov conditional random fields for infor-mation extraction[J]. Advances in Neural Information Processing Systems, 2005, 17:1185-1192.
    [8]TASKAR B, GUESTRIN C, KOLLER D. Max-Margin Markov Net-works[C]//Advances in Nenral Information Processing Systems 16. Cam-bridge, MA, USA: MIT Press, 2003.
    [9]TSOCHANTARIDIS I, HOFMANN T, JOACHIMS T, et al. Sup-port vector machine learning for interdependent and structured output spaces[C]//Proceedings of the twenty-first international conference on Ma-chine learning. New York, NY, USA: ACM, 2004:104.
    [10]COLLINS M. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms[C]//Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Stroudsburg, PA, USA: Association for Computational Linguis-tics, 2002:1-8.
    [11]CRAMMER K, DEKEL O, KESHET J, et al. Online passive-aggressive algo-rithms[J]. The Journal of Machine Learning Research, 2006, 7:551-585.
    [12]TSURUOKA Y, ICHI TSUJU J, ANANIADOU S. Stochastic gradi-ent descent training for 11-regularized log-linear models with cumulative penalty[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudsburg, PA, USA: Association for Compu-tational Linguistics, 2009:477-485.
    [13]QIAN X, JIANG X, ZHANG Q, et al. Sparse higher order conditional random fields for improved sequence labeling[C]//Proceedings of the 26th Ammal International Conference on Machine Learning. New York, NY, USA: ACM, 2009:849-856.
    [14]KAJI N, EUJIWARA Y, YOSHINAOA N, et al. Efficient Staggered Decoding for Sequence Labeling[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics, 2010:485-494.
    [15]CHU C T, KIM S K, LIN Y A, et al. Map-reduce for machine learning on multicore[C]//Advances in Neural Information Processing Systems 19: Pro-ceedings of the 2006 Conference. Cambridge, MA, USA: MIT Press, 2007:281.
    [16]MCDONALD R, HALL K, MANN G. Distributed training strategies for the structured perceptron[C]//Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Compu-tational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010:456-464.
    [17]XUE N, SHEN L. Chinese word segmentation as LMR tag-ging[C]//Proceedings of the Second SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA, USA: Association for Computational Linguis-tics, 2003:176-179.
    [18]NG H T, Low J K. Chinese part-of-speech tagging: one-at-a-time or all-at-once7 word-based or character-based[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: ACL, 2004:277-284.
    [19]RABINER L. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989, 77(2):257-286.
    [20]McDoNaLD R, PEREIRA F, RIBAROV K, et al. Non-projective dependency parsing using spanning tree algorithms[C]//Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguis-tics, 2005:523-530.
    [21]DEAN J, GHEMAWAT S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1):107-113.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700