用户名: 密码: 验证码:
可复用资产管理系统中资产检索方法的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着软件行业的发展,软件的需求量迅速增加且软件系统规模也日益扩大,越来越多的软件企业意识到软件复用的重要性。软件企业实施软件复用最有效途径是对企业自身的资产进行复用。可复用资产管理系统以对象管理组织(OMG)提出的可复用资产规约为理论依据,实现对企业内可复用资产的描述、存储和检索等功能。开发该系统时所面临的一个主要技术问题是如何检索系统中大量的资产,合理的资产检索方法能大大降低检索成本和理解成本,反之则会提升企业使用系统的难度,最终导致复用失败。
     文章结合国内软件企业的软件复用现状和企业的需求,确立了基于资产实体描述文件的关键词检索和领域刻面分类检索两种检索方法。它们分别适合企业实施可复用资产管理系统的初期和后期,能适应开发人员在软件复用经验上的成长变化。本文主要研究如何实现这两种检索方法,其中涉及对一些成熟的检索技术进行改进,并运用到可复用资产管理系统中,使检索功能更符合企业的需求。
     首先,本文对可复用资产规约进行研究,并实现了资产的实体描述文件,它是一种XML文档,包含了资产的元数据。在资产实体描述文件的元素中,实现了资产的关键词描述和领域刻面术语描述,这些描述信息用于建立资产的相关倒排索引,以提高资产检索效率。
     其次,文章详细阐述了利用传统信息检索技术对资产实体描述文件进行关键词抽取、编码和通过倒排索引实现关键词检索的过程。抽取关键词时,本文提出由人为指定资产的关键词序列以解决缺乏软件复用领域词典的问题,使用正向匹配算法实现资产实体描述文件的关键词抽取。为了使检索结果粒度更细,帮助用户获取资产中最相关的信息,本文研究对于某个查询关键词序列,如何利用Dewey编码查找资产实体描述文件树的关键词最小公共祖先结点。为了对检索结果进行排序,研究关键词与资产实体描述文件的相关度计算公式,并且从关键词的概率分布以及在描述文档中的位置两方面来衡量相关度。
     此外,本文对传统刻面模式进行了改进。在分析了传统刻面模式的不足后,提出基于FODA(面向特征的领域分析)的领域刻面分类模式,它将所有刻面分为三层,每一层的刻面组对应FODA的三个阶段:确定领域边界并建立边界模型、提取功能需求并建立特征模型和细化领域分析并建立架构模型。每一层中的刻面术语分别对应边界模型、特征模型和架构模型中的特征术语。实现领域刻面分类检索时,由于刻面术语之间存在一般/特殊关系,为了使术语与资产匹配时能体现这种关系,对刻面描述文件进行编码,并利用Dewey编码的特点判断术语的所有子术语、生成刻面匹配术语集合和计算术语权重。
     最后,详细介绍资产检索模块的设计与实现。以MVC模式中的模型实现资产检索模块,介绍实现这些模型的关键技术和核心代码。
With the development of software industry, demands for software rapidly increase and scales of software systems also grow. Thus, more and more software enterprises have realized the importance of software reuse. The most effective method of software reusing for software enterprises is to reuse software assets of their own. The reusable asset management system, which is based on reusable asset specification presented by OMG, implements the functions of asset description, asset storage and asset retrieval. How to retrieve assets in the system is a main problem when we design the reusable asset management system. A proper retrieval method will effectively reduce the costs of retrieval. On the contrary, a rough method will make the use of system more difficult and results in a failure of software reuse.
     According to the situation of software reuse in domestic software enterprises and requirements from enterprises, this paper proposes keyword retrieval based on asset manifests and domain faceted retrieval. Keyword retrieval method is suitable for enterprises during their rookie days of reusable asset management system implementation. The other is suitable for enterprises during their mature days. These two retrieval methods will be adapted to the growth of developers' software reusing experience. This paper mainly focuses on the implementation of these two retrieval methods. During the research, we improve some mature retrieval methods and apply them to the reusable asset management system.
     Firstly, the paper researches on reusable asset specification and creates asset manifests based on it. Asset manifests are XML files and contain metadata of assets. We create the descriptions of asset keywords and facet terms under the classification element in asset manifest. These descriptions are used to create inverted lists of asset.
     Secondly, the paper researches on extracting keywords from asset manifest, encoding asset manifest and creating inverted list. When researching on extracting keywords, we propose a method of manually defining keyword list, which can avoid a word-extracting dictionary of software reusing, and we use a directed word segment algorithm to realize extracting keywords from sentences. To help users catch the most related information of assets, the paper researches how to find the keywords smallest common ancestor in a manifest tree by analyzing Dewey id of nodes. To sort the result of retrieval, we research on calculating the correlation between asset and keyword list, which depends on keywords statistics and their positions in asset manifest.
     Morever, the paper proposes an improvement on the traditional facet scheme. After analyzing the shortage of traditional facet scheme, we present a domain facet scheme based on FODA (Featured Oriented Domain Analysis), which has three layers. Each layer in this facet scheme corresponds to a phase of FODA, and terms in each layer correspond to terms in context model, feature model and architecture model. Concerning implementation of domain faceted retrieval, terms in facet always have relationship of ancestor and descendant. To reflect this relationship among terms in matching assets, we encode the facet manifest to realize finding all descendants of one term, creating matching-term list and calculating term weights.
     Lastly, the paper introduces how we design the modules of asset retrieval. We implement these modules as models of MVC framework.
引文
[1]Hafedh Mili,Fatma Mil,Ali Mili,Reusing Software:Issues and Research Directions,IEEE Transactions on Software Engineering,1995,21(6),528-562
    [2]杨芙清,软件复用及相关技术,计算机科学,1999,26(5),1-4
    [3]Donald J.Reifer,孙艳春等(译),软件复用实践,北京,机械工业出版社,2005,3-5
    [4]Carma McClure,王亚沙等(译),软件复用标准指南,北京,电子工业出版社,2004,1-7
    [5]Object Management Group,Reusable Asset Specification Version 2.2,2005
    [6]Object Management Group,The Common Object Request Broker:Architecture and Specification Revision 2.3,Framingham,MA01701,USA,1998
    [7]D.Box,Essential Com,USA,Addison-Wesley Publishing Company,1998
    [8]R.Monson-Haefel,Enterprise Java Beans,Second Editon,USA,O'Reily,2000
    [9]Wolfgang Emmerich,Nima Kaveh,Component Technologies:Java Beans,COM,CORBA,RMI,EJB and the CORBA Component Model,In Proceedings of ICSE'02,2002,691-692
    [10]F.McCartan,C.Donell,The Integration Retrieval Techniques within a Software Reuse Environment,Journal of Information Science,2000,26(4),520-539
    [11]常继传,郭立峰,马黎,可复用软件构件的表示和检索,计算机科学,1999,26(5),45-50
    [12]Atkinson S,A Unifying Model for Retrieval from Reusable Software Libraries,Technical Report,1995,95(41),32-40
    [13]Ruben Prieto-Diaz,Peter Freeman,Classifying Software for Reusability,IEEE Software,1987,4(13),6-16
    [14]NATO Communications and Information Systems Agency,NATO Standard for Management of a Reusable Software Component Library,5-8
    [15]Jean-Marc MOREL,Jean FAGET,The REBOOT Environment BULL S.A.Rue Jean JAURES,F-78340LES-CLAYES-SOUS-BOIS,France,110-113
    [16]常继传,李克勤,郭立峰等,青鸟系统中可复用软件构件的表示与查询,电子学报,2000,28(8),1-4
    [17]Ruben Prieto-Diaz,A Faceted Approach to Building Ontologies,IEEE,2003,3-6
    [18]彭鑫,赵文耘,肖君,基于本体的构件描述和检索,计算机工程,2006,41(1),6-8
    [19]XML Schema与DTD的技术比较与分析,http://www.ibm.com/developeworks/cn/xml/x-sd/index.shtml
    [20]江腾蛟,万常选,针对XML文档集的关键词检索结果排序,计算机工程,2007,33(2),59-61
    [21]何东彬,王俊义,XML文档检索技术研究,内蒙古大学学报自然科学版,2006
    [22]廖述梅,万常选,徐升华,XML信息检索探究,情报学报,2007,26(2),229-234
    [23]曾一,许娴,张元平,一种基于Schema的XML索引结构,计算机工程,2006,34(10),36-38
    [24]World Wide Web Consortium,XML Path Language(XPath)Version 1.0,W3C Recommendation,1999
    [25]World Wide Web Consortium,XML Path Language(XPath)2.0,W3C Draft,2004
    [26]World Wide Web Consortium,XQuery 1.0:An XML Query Language,W3C Working Draft,2004
    [27]Daniela Florescu,Donald Kossmann,Storing and Querying XML Data Using an RDBMS,Data Engineering Bulletin,1999,22(3),112-118
    [28]J.Shanmugasundaram,K.Tufte,C.Zhang,etc.,Relational Databases for Querying XML Documents:Limitations and Opportunities,In Proceedings of VLDB Edinburgh,Scotland,1999
    [29]苏新宁,信息检索理论与技术,北京,科学技术文献出版社,2004
    [30]Ben He,Iadh Ounis,On Setting the Hyper-parameters of Term Frequency Normalization for Information Retrieval,In Proceedings of TOIS,2007,25(3)
    [31]Sara Cohen,Jonathan Mamou,Yaron Kanza,etc.,XSEarch:A Semantic Search Engine for XML,In Proceedings of VLDB Berlin,Germany,2003
    [32]Lin Guo,Feng Shao,Chavdar Botev,etc.,XRANK:Ranked Keyword Search over XML Documents,In Proceedings of SIGMOD 2003,San Diego,Canada,2003
    [33]徐如志,钱乐秋,程建平等,基于XML的软件构件查询匹配算法研究,软件学报,2003,14(7),1195-1202
    [34]Zhang K Z,On the Editing Distance Between Unordered Labeled Trees,Information Processing Letters,1992,42(3),133-139.
    [35]Dietz P F,Maintaining Order in a Linked List,In Proceedings of STPC'82,San Francisco,USA,1982
    [36]Jiaheng Lu,Tok Wang Ling,Chee-Yong Chan,etc.,From Region Encoding To Extended Dewey:On Efficient Processing of XML Twig Pattern Matching,In Proceedings of VLDB Trondheim,Norway,2005
    [37]The ACM Computing Classification System,http://oldwww.acm.org/class/1998/
    [38]Kazunari Sugiyama,Kenji Hatano,Masatoshi Yoshikawa,Refinement of TF-IDF Schemes for Web Pages Using Their Hyperlinked Neighboring Pages,In Proceedings of HYPERTEXT'03,2003,198-207
    [39]Salton G,McGill M J,Introduction to Modem Information Retrieval,McGraw-Hill Book Co.,New York,1983
    [40]E.Riloff,L.Hollaar,Text Database and Information Retrieval,ACM Computer Surveys,1996,28(1),133-135
    [41]A.Moffat,J.Zobel,Parameterized Compression for Sparse Bitmaps,In Proceedings of 5~(th)Intl.ACM SIGIR Cone on Research and Development in Information Retrieval,1992,274-285
    [42]A.Moffat,J.Z.Zobel,Self-indexing Inverted Files for Fast Text Retrieval,ACM Trac.Information System,1996,14(4),349-379
    [43]R.Prieto-Diaz,Status Report:Software Reusability,IEEE Software,1993,61-66
    [44]马锟,基于刻面分类模式的构件检索技术研究,[学位论文],大连海事大学,2006
    [45]袁东娟,基于刻面描述的水资源领域的构件检索方法,[学位论文],河海大学,2007
    [46]邹博,基于刻面分类的软件构件检索的研究,[学位论文],哈尔滨工程大学,2006
    [47]邹咸林,领域分析方法及技术讨论,现代计算机,2001,10,18-20
    [48]Ruben Prieto-Diatz,Implementing Faceted Classification for Software Reuse,ACM Press,New York,USA,1991,88-97
    [49]KANG,KYO C,COHEN,Feature-oriented Domain Analysis(FODA)Feasibility Study,Pa:Software Engineering Institute,Carnegie Mellon University,1990
    [50]Ricardo,Giancarlo Guizzardi,An Ontological Approach to Domain Engineering,In Proceedings of SEKE'02,2002
    [51]N.A.M.Maiden,A.G.Sutcliffe,Requirements Engineering by Example:An Empirical Study,In Proceedings of IEEE Int'l Symposium on Requirements End.,1993,104-111
    [52]郑伟,基于XML的树型结构编码及结构相似性匹配方法,[学位论文],东北师范大学
    [53]Xerces Java Parser Readme,http://xerces.apache.org/xerces-j/
    [54]Elliotte Rusty Harold,Processing XML with Java A Guide to SAX,DOM,JDOM,JAXP and TrAX,Pearson,USA,2003
    [55]Yu Xu,Yannis Papakonstantinou,Efficient Keyword Search for Smallest LCAs in XML Databases,In Proceedings of SIGMOD,Baltimore,2005

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700