用户名: 密码: 验证码:
基于StackOverflow数据的软件功能特征挖掘组织方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Mining and Organizing Software Functional Features Based on StackOverflow Data
  • 作者:朱子骁 ; 邹艳珍 ; 华晨彦 ; 沈琦 ; 赵俊峰
  • 英文作者:ZHU Zi-Xiao;ZOU Yan-Zhen;HUA Chen-Yan;SHEN Qi;ZHAO Jun-Feng;Key Laboratory of High Confidence Software Technologies (Peking University),Ministry of Education;Institute of Software,School of Electronics Engineering and Computer Science,Peking University;Beida (Binhai) Information Research;
  • 关键词:软件复用 ; 功能特征 ; 软件文档 ; StackOverflow ; 自然语言句法分析 ; 频繁子图挖掘
  • 英文关键词:software reuse;;functional feature;;software documentation;;Stack Overflow;;natural language syntax parsing;;frequent subgraph mining
  • 中文刊名:RJXB
  • 英文刊名:Journal of Software
  • 机构:高可信软件技术教育部重点实验室(北京大学);北京大学信息科学技术学院软件研究所;北京大学(天津滨海)新一代信息技术研究院;
  • 出版日期:2018-03-13 17:30
  • 出版单位:软件学报
  • 年:2018
  • 期:v.29
  • 基金:国家重点研发计划(2016YFB1000801);; 国家杰出青年科学基金(61525201)~~
  • 语种:中文;
  • 页:RJXB201808004
  • 页数:16
  • CN:08
  • ISSN:11-2560/TP
  • 分类号:38-53
摘要
软件的功能描述文档是开发人员了解软件的重要基础.现有的软件项目并不都是具备全面描述软件功能的文档,但软件项目开发和应用过程中的各种交流记录蕴含了讨论其功能的大量信息.为此,提出了一种基于StackOverflow问答数据的软件功能特征挖掘组织方法.该方法提出以动宾短语形式描述软件功能特征,挖掘并组织蕴含在StackOverflow数据中的软件功能特征,自动生成一种以层次化方式展示的软件项目功能特征文档.在针对真实项目的实验中,该方法生成的软件功能文档可以覆盖官方文档中列举的97.6%的软件常用功能.同时,该方法可以扩展从不同形式的项目交流记录中生成全面描述软件功能特征的文档.
        Functional specification documents are very important for the developers who want to understand and reuse unfamiliar software libraries. Due to high cost of human effort and time, lots of software do not provide the official functional documentation. However, some software communication records produced in software developing processes contain valuable information regarding software functions and usages. In this paper, an approach is proposed to automatically mining and organizing functional features for open source software based on Stack Overflow data. By describing functional features in the form of verb phrases, this approach generates hierarchical list of software functional features as the supplement of software documentation. In the experimental evaluation on some real-world subjects, the automatically generated documents have covered 97.6% of the frequent-used functional features in the official documents. At the same time, the proposed approach can be adapted to different types of software communication records, and applied to software in different domains.
引文
[1]Robillard MP,Deline R.A field study of API learning obstacles.Empirical Software Engineering,2011,16(6):703-732.
    [2]Robillard MP.What makes APIs hard to learn?Answers from developers.IEEE Software,2009,26(6):27-34.
    [3]Scaffidi C.Why are APIs difficult to learn and use?Crossroads,2006,12(4):No.4.
    [4]Parnin C,Treude C,Grammel L,Storey MA.Crowd documentation:Exploring the coverage and the dynamics of API discussions on Stack Overflow.Technical Report,Georgia Institute of Technology,2012.
    [5]Wang H,Peng X,Yu H,Zhao WY.Research on interaction process of question and answer in social software development.Computer Applications and Software,2017,34(5):1-11(in Chinese with English abstract).
    [6]Liu CM,Guo Y,Yu XM,Zhao L,Liu Y,Cheng XQ.Information extraction research aimed at open source Web pages.Journal of Frontiers of Computer Science and Technology,2017,11(1):114-123(in Chinese with English abstract).
    [7]Treude C,Robillard MP,Dagenais B.Extracting development tasks to navigate software documentation.IEEE Trans.on Software Engineering,2015,41(6):565-581.
    [8]Treude C,Sicard M,Klocke M,Robillard M.Task Nav:Task-Based navigation of software documentation.In:Proc.of the Int’l Conf.on Software Engineering(ICSE).IEEE,2015.649-652.
    [9]Zhang Y,Hou D.Extracting problematic API features from forum discussions.In:Proc.of the Int’l Conf.on Program Comprehension(ICPC).IEEE,2013.142-151.
    [10]Binkley D,Lawrie D,Hill E,Burge J,Harris I,Hebig R,Keszocze O,Reed K,Slankas J.Task-Driven software summarization.In:Proc.of the Int’l Conf.on Software Maintenance(ICSM).IEEE,2013.432-435.
    [11]Murphy GC,Kersten M,Robillard MP,Cubranic D.The emergent structure of development tasks.In:Proc.of the ECOOP.2005.33-48.
    [12]Wong E,Yang J,Tan L.Autocomment:Mining question and answer sites for automatic comment generation.In:Proc.of the Int’l Conf.on Automated Software Engineering(ASE).IEEE,2013.562-567.
    [13]Panichella S,Aponte J,Di Penta M,Marcus A,Canfora G.Mining source code descriptions from developer communications.In:Proc.of the Int’l Conf.on Program Comprehension(ICPC).IEEE,2012.63-72.
    [14]Rastkar S,Murphy GC,Murray G.Summarizing software artifacts:A case study of bug reports.In:Proc.of the Int’l Conf.on Software Engineering(ICSE).ACM Press,2010.505-514.
    [15]Petrosyan G,Robillard MP,De Mori R.Discovering information explaining API types using text classification.In:Proc.of the Int’l Conf.on Software Engineering(ICSE).IEEE,2015.869-879.
    [16]Shepherd D,Fry ZP,Hill E,Pollock L,Vijay-Shanker K.Using natural language program analysis to locate and understand actionoriented concerns.In:Proc.of the Int’l Conf.on Aspect-Oriented Software Development.ACM Press,2007.212-224.
    [17]Haiduc S,Marcus A.On the use of domain terms in source code.In:Proc.of the Int’l Conf.on Program Comprehension(ICPC).IEEE,2008.113-122.
    [18]General Administration of Quality Supervision,Inspection and Quarantine of the P.R.C,Standardization administration of the P.R.C.GB/T 11457-2006:Information Technology Software Engineering Terminology.2006(in Chinese).
    [19]Dagenais B,Robillard MP.Recovering traceability links between an API and its learning resources.In:Proc.of the Int’l Conf.on Software Engineering(ICSE).IEEE,2012.47-57.
    [20]Manning CD,Surdeanu M,Bauer J,Finkel JR,Bethard S,Mc Closky D.The Stanford Core NLP natural language processing toolkit.In:Proc.of the ACL(System Demonstrations).2014.55-60.
    [21]Klein D,Manning CD.Accurate unlexicalized parsing.In:Proc.of the Annual Meeting on Association for Computational Linguistics,Vol.1.Association for Computational Linguistics,2003.423-430.
    [22]Levy R,Andrew G.Tregex and tsurgeon:Tools for querying and manipulating tree data structures.In:Proc.of the Int’l Conf.on Language Resources and Evaluation.2006.2231-2234.
    [23]Santorini B.Part of speech tagging guidelines for the Penn Treebank Project(3rd Revision).Department of Computer and Information Science,University of Pennsylvania,2009.
    [24]Bies A,Ferguson M,Katz K,Mac Intyre R,Tredinnick V,Kim G,Marcinkiewicz MA,Schasberger B.Bracketing guidelines for Treebank II style Penn Treebank project.University of Pennsylvania,1995.97-100.
    [25]Zhao W,Zhang L,Mei H,Sun JS.A functional requirement based hierarchical agglomerative approach to program clustering.Ruan Jian Xue Bao/Journal of Software,2006,17(8):1661-1668(in Chinese with English abstract).http://www.jos.org.cn/1000-9825/17/1661.htm[doi:10.1360/jos171661]
    [26]Yan X,Han J.g Span:Graph-Based substructure pattern mining.In:Proc.of the Int’l Conf.on Data Mining.IEEE,2002.721-724.
    [27]Gao Y,Liu H,Fan XZ,Niu ZD.Method name recommendation based on source code depository and feature matching.Ruan Jian Xue Bao/Journal of Software,2015,26(12):3062-3074(in Chinese with English abstract).http://www.jos.org.cn/1000-9825/4817.htm[doi:10.13328/j.cnki.jos.004817]
    [5]王海,彭鑫,于涵,赵文耘.社交化软件开发问答中的交互过程研究.计算机应用与软件,2017,34(5):1-11.
    [6]刘春梅,郭岩,俞晓明,赵岭,刘悦,程学旗.针对开源论坛网页的信息抽取研究.计算机科学与探索,2017,11(1):114-123.
    [18]中华人民共和国国家质量监督检验检疫总局,中国国家标准化管理委员会.GB/T 11457-2006:信息技术软件工程术语.2006.
    [25]赵伟,张路,梅宏,孙家骕.一种基于功能需求层次凝聚的程序聚类方法.软件学报,2006,17(8):1661-1668.http://www.jos.org.cn/1000-9825/17/1661.htm[doi:10.1360/jos171661]
    [27]高原,刘辉,樊孝忠,牛振东.基于代码库和特征匹配的函数名称推荐方法.软件学报,2015,26(12):3062-3074.http://www.jos.org.cn/1000-9825/4817.htm[doi:10.13328/j.cnki.jos.004817]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700