用户名: 密码: 验证码:
基于地址语义理解的中文地址识别方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Chinese address recognition method based on address semantics
  • 作者:李晓林 ; 张懿 ; 李霖
  • 英文作者:LI Xiao-lin;ZHANG Yi;LI Lin;Hubei Key Laboratory of Intelligent Robot,Wuhan Institute of Technology;School of Resource and Environmental Science,Wuhan University;
  • 关键词:地址语义 ; 要素特征字 ; 转移概率 ; 无词典
  • 英文关键词:address semantics;;feature character word;;transfer probability;;without dictionary
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:武汉工程大学智能机器人湖北省重点实验室;武汉大学资源与环境科学学院;
  • 出版日期:2019-03-15
  • 出版单位:计算机工程与科学
  • 年:2019
  • 期:v.41;No.291
  • 基金:十三五国家重点研发计划课题(2017YFB0503701);; 国家863计划(2013AA12A202);; 测绘地理信息公益性行业科研专项(201412014);; 湖北省自然科学基金(2013CFA125)
  • 语种:中文;
  • 页:JSJK201903024
  • 页数:8
  • CN:03
  • ISSN:43-1258/TP
  • 分类号:171-178
摘要
互联网中中文地址文本蕴含着丰富的空间位置信息,为了更加有效地获取文本中的地址位置信息,提出一种基于地址语义理解的地址位置信息识别方法。通过对训练语料进行词频统计,制定地址要素特征字集合和字转移概率,构造特征字转移概率矩阵,并结合字符串最大联合概率算法,设计了一种不依赖地名词典和词性标注的地址识别方法。实验结果表明,该方法对地址要素特征字突出且存在歧义的中文地址的完全匹配率为76.85%,识别准确率为93.11%。最后,与机械匹配算法和基于经验构造转移概率矩阵的方法进行对比实验,实验结果表明了该方法的可用性和有效性。
        There are a large number of Chinese address text in the Internet that contains rich spatial location information. In order to obtain the address location information in the text more effectively, we propose a Chinese address location information recognition method based on address semantics. According to the statistics of word frequency of the training corpus, we obtain a set of address feature words and word transition probability. Then, we construct a feature word transition probability matrix. Finally, combining with the string maximum joint probability algorithm, we put forward an address recognition method which does not depend on address dictionary and tagging of the part of speech. Experimental results show that the exact match rate of the method is 76.85% for ambiguous Chinese addresses with prominent feature words, and the recognition accuracy is 93.11%. Compared with the mechanical matching algorithm and the methods for constructing the transition probability matrix based on experience, experimental results verify the feasibility and effectiveness of the proposed method.
引文
[1] Zong Cheng-qing. Statistical natural language processing[M]. 2nd Edition. Beijing:Tsinghua University Press, 2013. (in Chinese)
    [2] Zhao Wei-feng, Zhang Qin. Automatic identification of address description in unstructured Chinese natural language[J]. Computer Engineering and Applications, 2016, 52(23):19-24.(in Chinese)
    [3] Yu Bin, Cheng Chang-xiu, Zuo Ting-ying. Expert system-based geocoding method for national economic census application[J]. Application Research of Computers, 2010, 27(8):2976-2979.(in Chinese)
    [4] Zang Ying-fei, Wang Bin, Qu Xiao-wen. Discussion on the construction of Chinese semantic address model in Chongqing[J]. Geospatial Information, 2015, 13(3):122-125. (in Chinese)
    [5] Ding Zhen-guo, Zhang Zhuo, Li Jing. Improvement on reverse directional maximum matching method based on hash structure for Chinese word segmentation[J]. Computer Engineering and Design, 2008, 29(12):3208-3211.(in Chinese)
    [6] Zhang Xue-ying, Lü Guo-nian, Li Bo-qiu, et al. Rule-based approach to semantic resolution of Chinese address[J]. Journal of Geo-Information Science, 2010, 12(1):9-16.(in Chinese)
    [7] Wang Ke-yong, Liu Ji-ping, Luo An, et al. Extracting toponomy and location based on the combination of prefix and suffix with feature words[J]. Bulletin of Surveying and Mapping, 2016(2):64-68. (in Chinese)
    [8] Song Zi-hui. Address matching algorithm based on Chinese natural language understanding[J]. Journal of Remote Sensing, 2013, 17(4):788-801.(in Chinese)
    [9] Luo Ming, Huang Hai-liang. New method of Chinese address standardization based on finite state machine theory[J]. Application Research of Computers, 2016, 33(12):3691-3695.(in Chinese)
    [10] Duan Yan-hui, Li Xiao-lin, Huang Shuang. Exteraction of administrative division of Chinese address based on conditional random fields[J]. Journal of Wuhan Institute of Technology, 2015, 37(11):47-51.(in Chinese)
    [11] Li Xiao-lin, Huang Shuang, Lu Tao, et al. Administrative division extracting algorithm for non-normalized Chinese addresses[J]. Journal of Computer Applications, 2017, 37(3):876-882. (in Chinese)
    [12] Liu Zhe, Xia Xiu-feng, Zhou Fu-cai. A segment method of Chinese address information[J]. Journal of Shenyang Institute of Aeronautical Engineering, 2008, 25(4):63-66. (in Chinese)
    [13] Duan Yan-hui. Research on the analytical method of the geographical elements for the Chinese address of the Internet[D]. Wuhan:Wuhan Institute of Technology, 2016.(in Chinese)
    [14] Sun Ya-fu, Chen Wen-bin. Address matching technology based on segmentation[EB/OL].[2016-01-05]. http://xueshu.baidu.com/s?wd=paperuri%3A%284105a7e9cf9ea8588730d99199975503%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fcpfd.cnki.com.cn%2FArticle%2FCPFDTOTAL-DLXX200711001019.htm&ie=utf-8&sc_us=16495669320387933132.(in Chinese)
    [15] Baum L E, Petrie T, Soules G, et al. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains[J]. Annals of Mathematical Statistics, 1970, 41(1):164-171.
    [16] Han Dong-xu, Chang Bao-bao. Approaches to domain adaptive Chinese segmentation model[J]. Chinese Journal of Computers, 2015, 38(2):272-281.(in Chinese)
    [17] Sproat R, Emerson T. The 1st International Chinese word segmentation bakeoff[C]//Proc of the 2nd SIGHAN Workshop on Chinese Language Processing, 2003:133-143.
    [18] Shao Yan, Liu Yan-bing, Tan Jian-long, et al. Automatic classification approach of express address based on probability statistical model[J]. Computer Engineering, 2012, 38(23):277-283.(in Chinese)
    [19] Guo Wen-long. Cleaning approach to large amounts of Chinese address based on SNM algorithm[J]. Computer Engineering and Applications, 2014, 50(5):108-111.(in Chinese)
    [20] Liu Ting-ting, Zhu Wen-dong, Lu Hai-bing, et al. Research on Chinese address resolution and standardization in power big data[J]. Electric Power Information and Communication Technology, 2017, 15(5):1-7.(in Chinese)
    [1] 宗成庆. 统计自然语言处理[M]. 第2版.北京:清华大学出社,2013.
    [2] 赵伟峰,张勤.非结构化中文自然语言中地址描述的自动识别[J]. 计算机工程与应用,2016,52(23):19-24.
    [3] 于斌,程昌秀,左婷莹.基于专家系统的国民经济普查应用地理编码方法[J].计算机应用研究,2010,27(8):2976-2979.
    [4] 臧英斐, 王斌, 瞿晓雯. 重庆市中文语义地址模型构建方法探讨[J]. 地理空间信息, 2015,13(3):122-125.
    [5] 丁振国, 张卓, 李静. 基于哈希结构的汉语分词反向最大匹配方法的改进[J]. 计算机工程与设计, 2008, 29(12):3208-3211.
    [6] 张雪英,闾国年,李伯秋,等.基于规则的中文地址要素解析方法[J].地球信息科学学报,2010,12(1):9-16.
    [7] 王克永, 刘纪平, 罗安,等. 前后缀与特征词相结合的地名地址提取[J]. 测绘通报, 2016(2):64-68.
    [8] 宋子辉.自然语言理解的中文地址匹配算法[J]. 遥感学报,2013,17(4):788-801.
    [9] 罗明,黄海量.一种基于有限状态机的中文地址标准化方法[J].计算机应用研究, 2016,33(12):3691-3695.
    [10] 段艳会, 李晓林, 黄爽. 基于条件随机场的中文地址行政区划提取方法[J]. 武汉工程大学学报, 2015, 37(11):47-51.
    [11] 李晓林,黄爽,卢涛,等.非规范化中文地址的行政区划提取算法[J]. 计算机应用.2017,37(3):876-882.
    [12] 刘哲, 夏秀峰, 周福才. 基于中文地址类信息的分词处理[J]. 沈阳航空航天大学学报, 2008, 25(4):63-66.
    [13] 段艳会. 面向互联网中文地址的地理要素解析方法的研究[D].武汉:武汉工程大学, 2016.
    [14] 孙亚福,陈文斌.基于段的地址匹配技术[EB/OL]. [2016-01-05].http://xueshu.baidu.com/s?wd=paperuri%3a%284105a7e9cf9ea8588730d99199975503%29&filter=sc_long_sign&tn=se_xueshusource_2kduw22v&sc_vurl=http%3a%2f%2fcpfd.cnki.com.cn%2FArticle%2fcpfdtotal-dlxx200711001019.htm&ie=utf-8&sc_us=16495669320387933132.
    [16] 韩冬旭, 常宝宝.领域自适应中文分割模型的研究[J]. 中国计算机杂志, 2015, 38(2):272-281.
    [18] 邵艳, 刘燕兵, 谭建龙,等. 基于概率统计模型的快速地址自动分类方法[J]. 计算机工程, 2012, 38(23):277-283.
    [19] 郭文龙. 基于SNM算法的大量中文地址清理方法[J]. 计算机工程与应用, 2014, 50(5):108-111.
    [20] 刘婷婷,朱文东,陆海兵,等.电力大数据中文地址解析和规范化方法研究[J].电力信息与通信技术,2017,15(5):1-7.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700