基于数据挖掘的软件缺陷预测技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于数据挖掘的软件缺陷预测技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Software Defect Prediction Technology Based on Data Mining
作者：陈媛
论文级别：博士
学科专业名称：机械电子工程
中文关键词：软件缺陷预测 ; 似然关系模型 ; 高维聚类 ; 缺陷分布模式 ; 混合变量聚类
英文关键词：Software Defect Prediction ; Probabilistic Relational Models ; High-dimensional Clustering ; Defect Distribution Pattern ; Mixed-variable Clustering
学位年度：2012
导师：沈湘衡
学科代码：080202
学位授予单位：中国科学院研究生院（长春光学精密机械与物理研究所）
论文提交日期：2012-05-01

摘要

软件缺陷是软件的固有属性，是软件开发过程中的“副产品”。其主要危害是影响软件的质量、延长开发周期和增加开发成本。阶段性的测试是及时发现软件错误，提高软件质量的重要手段。而准确地预测软件缺陷的分布情况对软件测试工作有着重要的指导意义。随着计算机技术的不断发展，软件的规模和复杂程度呈几何级数增长，为了能够准确细致地预测软件缺陷的产生和分布情况，人们需要分析的影响因素也越来越多。此时，传统的预测方法已经很难处理具有复杂因果关系的不确定性知识推理预测问题，而且这些方法的预测结果往往由于过于宽泛而失去实用意义。为了解决这一难题，人们开始尝试将其他学科的研究成果应用到软件缺陷预测领域中，其中比较常用的是数据挖掘技术。
     本文在对数据挖掘技术和软件缺陷相关知识进行深入研究的前提下，开展了对软件缺陷预测技术的研究和应用工作，将数据挖掘技术的两个分支——似然关系模型和聚类技术应用到软件缺陷预测中，提出了三种预测算法，开发了软件缺陷管理预测系统，并应用该系统对三种算法分别进行了实现和实验验证。
     本文的主要贡献和研究成果如下：
     1.根据测试方法来对软件缺陷进行分类预测。目前的软件缺陷预测领域主要的研究方向是对软件缺陷数量以及错误级别的预测，为了提高预测结果的实用性，本文根据测试方法来对软件缺陷进行分类预测，测试人员可以根据预测结果来制订有针对性的测试计划，从而使预测结果更具指导意义。
     2.提出基于似然关系模型的软件缺陷预测算法。似然关系模型（Probabilistic Relational Models，PRM）是一种以贝叶斯网（Bayesian Network）为基础衍变而来的统计关系学习方法。它能够应用在更为复杂的类以及类之间的依赖关系推理问题上，从而对不确定性知识具有更强的表示和推理能力。软件开发中影响软件缺陷产生和分布的各种因素都可以看作是一个实体类，这些实体本身或其自身属性都直接或者间接的影响软件缺陷的产生和分布。因此，软件缺陷预测问题实际就是具有复杂类依赖关系的不确定性知识推理问题。目前，在软件缺陷预测方面，人们对使用似然关系模型对软件缺陷进行预测的研究寥寥无几，因此本文提出了基于似然关系模型的软件缺陷预测算法，将似然关系模型在描述和推理多属性类之间关系及其不确定性知识上的优势应用于软件缺陷预测领域中，并针对该模型的不足进行了优化。
     3.提出基于高维聚类的面向缺陷分布模式的软件缺陷预测算法。聚类分析（Clustering Analysis）是典型的无监督学习方法，它根据已有实例集中实例的某些参考属性对其进行分类，将数据分成若干个簇，同一个簇的数据对象有很高的相似度，而不同簇中的数据对象有很高的相异度。在软件的开发过程中，由能力相近的开发（测试）人员进行开发（测试）的软件，其测试结果往往具有近似的分布模式。另一方面，将软件缺陷按照测试方法的不同进行细致的分类，随之而来的是存储软件缺陷的数据维度变得越来越高，如何从高维数据中找到隐含的模式就成了亟需解决的问题。此前，高维聚类的相关知识在软件缺陷预测领域中应用较少，因此，本文提出了基于高维聚类的面向缺陷分布模式的软件缺陷预测算法，对多维数据相似度进行了重新定义，通过构造特殊的估价函数来寻找数据中隐含的模式信息，并通过控制聚类过程来合并近似的测试实例，从而得到更为准确的预测结果以指导软件测试工作。
     4.提出基于人员能力混合变量聚类的软件缺陷预测算法。软件缺陷预测可能会遇到数据的冷启动问题，例如历史数据中没有新近人员的相关记录，那么针对该人员开发或测试软件的缺陷预测也无从谈起。在这种情况下，本文定义了一种衡量人员能力的方法，并据此提出了基于人员能力混合变量聚类的软件缺陷预测算法，该算法能够找到与新近人员能力相似的人员，并根据这些人员的相应历史数据来进行预测。
     5.开发软件缺陷管理预测系统，并应用此系统对以上三种基于数据挖掘技术的软件缺陷预测算法进行了实现和实验验证。实验结果显示，这三种算法具有各自的特点。其中，基于似然关系模型的软件缺陷预测算法对于大规模数据具有较好的准确性和较低的计算复杂度，但对于较小规模的数据，该算法预测精度较低；相比较而言，基于高维聚类的面向缺陷分布模式的软件缺陷预测算法对于规模较小的数据具有较高的准确性，但对于大规模的数据，存在算法时间复杂度较高的问题；基于人员能力混合变量聚类的软件缺陷预测算法在出现数据冷启动问题时，可以通过寻找近似属性实例的方法给出近似的预测结果。可见，根据实际情况灵活选取适当的算法不但可以提高预测质量，还可以降低预测工作的时间开销，这对提高软件质量和降低开发成本有着重要的意义。
     本文的研究结果丰富了数据挖掘技术在软件缺陷预测方面的研究和应用，提高了软件缺陷预测结果的实用价值，并对预测工作中的数据冷启动问题提出了一种解决方案，这些工作对提高软件缺陷预测的相关研究都有着积极的意义。
Software defect is an inherent property of the software, it is the "by-product" inthe software development process,its main nazards is reducing the quality ofsoftware,extending the development cycle and increasing development cost.Stagetest is a very importand tool to find software defect timely and improve softwarequality.And accurately predict the distribution of software defects has a greatsignificance for the software testing. With the continuous development of computertechnology, software size and complexity is growing exponentially, in order toaccurately predict software defect generation and distribution, the impact factorswhich people need to analyze are become more and more. At this point, thetraditional predict methods have been difficult to deal with the reasoning predictionwhich has complex causal relationship and uncertainty knowledge, and the predictresults of these methods are often too broad to lost its practical significance. To solvethis problem, people began to try to apply the research achievement of otherdisciplines to the field of software defect prediction,the data mining techniques is acommon tool.
     This paper launched a research and application work on the software defectprediction techniques based on an in-depth study of data mining technology andsoftware defect prediction techniques, proposed three prediction algorithms based onusing the two branches of the data mining technology-Probabilistic Relational Models and Clustering Analysis Techniques for the software defect predictiontechniques, developed a software defect management and prediction system toachieve the three algorithms and verify their effectiveness. The main contributionand research results of this paper are as follows:
     1. Classified and predicted software defects based on the test methods. Atpresent, the main research direction of the software defect prediction field is how topredict the number and the error level of software defects. In order to improve thepracticality of the predict results, this paper classified and predicted software defectsbased on the test methods, it can make the predict results be more meaningful.Testers could develop a targeted testing plan based on the predict results.
     2. Proposed a software defect prediction algorithm based on ProbabilisticRelational Models (PRM). PRM is a statistics relationship learning method evolutesfrom Bayesian network. It can be used in the more complex class and therelationship between class, thus increasing the uncertainty knowledge representationand reasoning ability. In the software development process, we can regard theseentities which affect the production and distribution of software defects as theclasses. These entities or their own property affect the generation and distribution ofsoftware defects directly or indirectly. So, the software defect prediction problem isactually an uncertainty knowledge inference problem with complex dependencies. Atpresent, the research using PRM to predict software defect is very few left, so thispaper proposed a software defect prediction algorithm based on PRM, and improvedit.
     3. Proposed a software defect distribution pattern prediction algorithm basedon High-dimensional clustering. Clustering Analysis is a typical unsupervisedlearning methods, It divides these instances in the instance set into clusters basedon some of their attributes to ensure that the instances which in the same cluster aresimilar to each other, and the instances which in the different cluster are different toeach other. On the one hand, in the software development process, the test result ofthese software testing projects which been developed by these people who have similar abilities always have similar defect distribution pattern; on the other hand, weclassify and predict software defects based on the test methods, so the dimension ofsoftware defect data will be higher and higher. It is a big problem that how to findthe implicit pattern from the high-dimensional data. So this paper proposed asoftware defect distribution pattern prediction algorithm based on High-dimensionalclustering. With this algorithm, we can find hidden data pattern in the defect data,and integrate these software test item class which have similar attributes to increasethe number of prediction reference data to improve prediction results.
     4. Proposed a software defect prediction algorithm based on Mixed-variableclustering of people capacity. Another problem of the software defect prediction isthe cold start of the data. For example, there is no related record of the developers ortesters who is newly to join in historical data, and then the prediction aimed on thesepeople is out of the question. In this case, this paper presented a method ofmeasuring personnel capability and according to this method, proposed a softwaredefect prediction algorithm based on Mixed-variable clustering of people capacity.This algorithm can find some person who have similar capability with the newperson, and make a prediction according to their corresponding data.
     5. Developed a software defect management and prediction system, and usedit to archive and test these three algorithms by experiment. Experimental resultsshow that these three algorithms have their own characteristics. The software defectprediction algorithm based on PRM has high accuracy and low computationalcomplexity on the large-scale data, but its accuracy will be lower on the small-scaledata; In comparison, the software defect distribution pattern prediction algorithmbased on High-dimensional clustering has higher accuracy on the small-scale data,but its computational complexity will be higher on the large-scale data; When wemeet the problem of cold start of the data, the software defect prediction algorithmbased on Mixed-variable clustering of people capacity can make an approximateresult by collecting these instances which have similar properties in the instance set.So, we can improve the quality of the prediction results and reduces the time overhead of the prediction by select the appropriate algorithm flexibility according tothe actual situation. This is very important to improve software quality and reducedevelopment costs.
     The research of this paper enriches the research work on how to use the datamining technology for software defect prediction field well, improve the practicalvalue of the software defect prediction results, and propose a solution for theproblem of cold start of the data. These works are positive to improve the relatedresearch of the software defect prediction.

引文

[1]朱三元．软件质量及其评价技术[M]．北京：清华大学出版社，1990．3-5
    [2] E. Kit．Software testing in the real world：Improving the process[M]．北京：机械工业出版社，2003．9-11
    [3] David S. Alberts. The economics of software quality assurance[C]. InProceedings of the June7-10,1976, national computer conference and exposition(AFIPS '76), ACM, New York, NY, USA:433-442
    [4] Lowell Jay Arthur. Improving software quality: an insider's guide to TQM[M].John Wiley&Sons, Inc.1993.
    [5] B.W. Boehm, J.R. Brown, and M. Lipow. Quantitative evaluation of softwarequality[C]. In Proceedings of the2nd international conference on Softwareengineering (ICSE '76), IEEE Computer Society Press, Los Alamitos, CA, USA,1976:592-605
    [6] B. Kitchenham; S.L.Pfleeger. Software quality: the elusive target [special issuessection][C]. Software, IEEE,1996,13(1):12-21
    [7] Capers Jones. Software Quality: Analysis and Guidelines for Success[M](1st ed.).Thomson Learning,1997:55-57
    [8] Gillies Alan. Software quality: Theory and management[M]. Chapman&Hall,1993:13-15
    [9] Boehm Barry W. Characteristics of software quality[M]. North-Holland Pub,Co.1978:23-25
    [10]刘英博，王建民．面向缺陷分析的软件库挖掘方法综述[J]．计算机科学，2007，7：1-4
    [11] J.L. Dalley. The art of software testing[C]. Aerospace and ElectronicsConference, NAECON1991, Proceedings of the IEEE1991National, May1991,1:757-760,2:20-24
    [12] B. Beizer. Software testing techniques[M]. Dreamtech Press,2002:102-103
    [13] J.A. Whittaker. What is software testing? And why is it so hard?[C]. Software,IEEE,2000,17(1):70-79
    [14] E.N. Adams. Minimizing cost impact of software defects[R]. IBM ResearchDivision, Report RC,1980
    [15]杨根兴，蔡立志，陈昊鹏．软件质量保证、测试与评价[M]．北京：清华大学出版社，2007，343-382
    [16] V.R. Basili; R.W. Selby. Comparing the Effectiveness of Software TestingStrategies[J]. IEEE Transactions on Software Engineering,1987, SE-13(12):1278-1296
    [17] Richard A. DeMillo. Software testing and evaluation[M]. ibm-evaluation,1986:12-13
    [18] Ron Patton．软件测试[M]．北京：机械工业出版社，2002：99-100
    [19] D.R. Kuhn; D.R. Wallace; A.M. Gallo. Software fault interactions andimplications for software testing[J]. IEEE Transactions on Software Engineering,2004,30(6):418-421
    [20] R.A. DeMillo; D.S. Guindi; W.M. McCracken; A.J. Offutt; K.N. King. Anextended overview of the Mothra software testing environment[C]. Software Testing,Verification, and Analysis, Proceedings of the Second Workshop on,1988:142-151,19-21
    [21] L. Copeland. A practitioner's guide to software test design[M]. Artech House,Inc., Norwood, MA, USA.2004:103-104
    [22]王青，伍书剑，李明树．软件缺陷预测技术[J]．软件学报，2008，9(7)：1565-1580
    [23] T.M. Khoshgoftaar, A. Herzberg, N. Seliya. Resource oriented selection ofrule-based classification models: An empirical case study[J]. Software QualityControl,2006,14(4):309-338
    [24] P.L. L, J. Herbsleb, M. Shaw. Forecasting field defect rates using a combinedtime-based approach: A case study of OpenBSD[J]. In: Proc. of the16th Int’l SympOn Software Reliability Engineering,2005:193-202
    [25] V. Basili, L. Briand, L. Walcelio. A validation of object oriented designmetrics as quality indicators[J]. IEEE Trans On Software Engineering,1996,22(10):751-761
    [26] CP Team. Development[EB／OL] version1.2.2006
    [27]何新贵．软件能力成熟度模型CMM的框架和内容[J]．计算机应用，2001，21(3)：1-5
    [28] F. Akiyama. An example of software system debugging[M]. Informationprocessing,1971:111-113
    [29] H. Maurice. Halstead Elements of Software Science (Operating andprogramming systems series)[M]. Elsevier Science Inc. New York, NY, USA,1977:53-56
    [30] L. Ottenstein. Predicting numbers of errors using software science[J]. ACMSIGMETRICS Performance Evaluation Review,1981,10(1):157-167
    [31] L. Ottenstein. Quantitative estimates of debugging requirements[J]. IEEETrans On software Engineering,1979, SE-5(5):504-514
    [32] M. Lipow. Number of Faults per Line of Code[J]. IEEE Transactions onSoftware Engineering,1982, SE-8(4):437-439
    [33] A.J. Albrecht; J.E. Gaffney. Software Function, Source Lines of Code, andDevelopment Effort Prediction: A Software Science Validation[J]. IEEE Transactionson Software Engineering,1983, SE-9(6):639-648
    [34] B. Terry Compton, Carol Withrow. Prediction and control of ADA softwaredefects[J]. Journal of Systems and Software,1990,12(3):199–207
    [35] W. Harrison; K. Magel; R. Kluczny; A. DeKock. Applying softwarecomplexity metrics to program maintenance[J]. Computer,1982,15(9):65-79
    [36] T.M. Khoshgoftaar; J.C. Munson. Predicting software development errorsusing software complexity metrics[J]. IEEE Journal on Selected Areas inCommunications,1990,8(2):253-261
    [37] T.J. McCabe. A Complexity Measure[J]. IEEE Transactions on SoftwareEngineering,1976, SE-2(4):308-320
    [38] A.H. Watson, T.J. McCabe. Structured testing: A testing methodology usingthe cyclomatic complexity metric[M]. NIST special Publication,1996:12-13
    [39] T. J. Mccabe. Structured Testing: A Software Testing Methodology Using theCyclomatic Complexity Metric[M]. The National Institute of Standards andTechnology,1982:9-10
    [40] G.K. Gill; C.F. Kemerer. Cyclomatic complexity density and softwaremaintenance productivity[J]. IEEE Transactions on Software Engineering,1991,17(12):1284-1288
    [41] M. Shepperd. A critique of cyclomatic complexity as a software metric[J].Software Engineering Journal,1988,3(2):30-36
    [42] B. Korel. Automated software test data generation[J]. IEEE Transactions onSoftware Engineering,1990,16(8):870-879
    [43] Norman Fenton. Predicting software defects in varying development lifecyclesusing Bayesian nets[J]. Information and Software Technology,2007,49(1):32–43
    [44] V. Vapnik，著；张学工，译．统计学习理论的本质[M]．北京：清华大学出版社，2000：12-17
    [45] Taghi M. Khoshgoftaar; Naeem Seliya. Fault Prediction Modeling for SoftwareQuality Estimation: Comparing Commonly Used Techniques[J]. Empirical SoftwareEngineering,2008,3(3):255-283
    [46] J.C. Munson, T.M. Khoshgoftaar. The Detection of Fault-Prone Programs[J].IEEE on Transactions Software Engineering,1992,ISSN:0098-5589:423-433
    [47] T.M. Khoshgoftaar; N. Seliya. Improving usefulness of software qualityclassification models based on Boolean discriminant functions[C]. Proceedings.13thInternational Symposium on Software Reliability Engineering, ISSRE2003:221-230
    [48] A.K. Jain, Richard C. Dubes. Algorithms for clustering data[J]. Technometrics,1990,32(2):227-229
    [49] D. Heckerman. A tutorial on learning with Bayesian networks[M]. In: JordanMI, ed. Learning in Graphical Models, Cambridge, MIT Press,1998:301354
    [50] S. Zhong, T.M. Khoshgoftaar, N.Seliya. Analyzing software measurement datawith clustering techniques[C]. IEEE Intelligent Systems,2004,19(2):20-27
    [51] Raymond H. Myers. Classical and modern regression with applications[M].Duxbury/Thompson Learning,1990:223-224
    [52] T.M. Khoshgoftaar, D.L. Lanning. A neural network approach for earlydetection of program modules having high risk in the maintenance phase[J]. Journalof Systems and Software,1995,29(1):85-91
    [53] J.C. Munson, T.M. Khoshgoftaar. Regression modelling of software quality:empirical investigation[J]. Information and Software Technology,1990,32(2):106-114
    [54] T.M. Khoshgoftaar, J.C. Munson, B.B. Bhattacharya, G.D. Richardson.Predictive Modeling Techniques of Software Quality from Software Measures[J].IEEE Transactions on Software Engineering,1992, ISSN:0098-5589:979-987
    [55] L.D. Xu. Case based reasoning[J]. IEEE Transactions on Software Engineering,1994, ISSN:0278-6648:10-13
    [56] Jacek Ratzinger, Thomas Sigmund, Harald C. Gall. On the relation ofrefactorings and software defect prediction[C]. In Proceedings of the2008international working conference on Mining software repositories (MSR '08), ACM,New York, NY, USA,2008:35-38
    [57] Qinbao Song; M. Shepperd, M. Cartwright; C. Mair. Software defectassociation mining and defect correction effort prediction[J]. IEEE Transactions onSoftware Engineering,2006,32(2):69-82
    [58] T.M. Khoshgoftaar, N. Seliya. Tree-based software quality estimation modelsfor fault prediction[J]. Proceedings. Eighth IEEE Symposium on Software Metrics,2002:203-214
    [59] N.E. Fenton; M. Neil. A critique of software defect prediction models[J]. IEEETransactions on Software Engineering,1999,25(5):675-689
    [60] T.M. Khoshgotftaar, Allen Edwdar. Applications of Information Theory toSoftware Engineering Measurement[J]. IEEE Transactions on Software Engineering,2001, Issue:9:851-864
    [61] ZhuYong chun, Xu Hong. EmPirieal-based software defect content estimationimprovement[J]. Journal of Beijing University of Aeronautics and Astronautics,2003,29(10):947-950
    [62] T.M. Khoshgoftaar, E.B. Allen, R. Halstead, G.P. Trio. Detection of fault-pronesoftware modules during a spiral life cycle[C]. Software Maintenance1996,Proceedings., International Conference on,1996:69-76
    [63] S. Chulani. Bayesian analysis software cost and quality models [D][Ph.D.Thesis]. Los Angeles: University of Southern California,1999
    [64] S. Chulani. Results of Delphi for the defect introduction model, sub-model ofthe cost/quality model extension to COCOMOⅡ[R]. Technical Report,USC-CSE-97-504,1997
    [65] S. Chulani, B. Boehm. Modeling software defect introduction and removal:COQUALMO (Constructive QUALity MOdel)[R]. Technical Report,USC-CSE-99-510,1999
    [66] M.E. Fagan. Design and code inspections to reduce errors in programdevelopment[J]. IBM System Journal,1976,15(3):182-211
    [67] T. C. Jones. Measuring programming quality and productivity[J]. IBM SystemsJournal,1978,17(1):39-63
    [68] H. Remus, S. Zilles. Prediction and management of program quality[C]. InProceedings of the4th international conference on Software engineering (ICSE '79),IEEE Press, Piscataway, NJ, USA,1979:341-350
    [69] R. Chillarege, I. Bhandari, K. Jarik, J. Michael, S. Diane, K. Bonnie, W.Man-Yuen. Orthogonal defect classification–a concept for in-processmeasurements[J]. IEEE Trans on Software Engineering,1992,18(11):943-956
    [70] N.E. Fenton, N. Martain, M. William, H. Peter, R. Lukrad, K. Paul. Predictingsoftware defects in varying development lifecycles using Bayesian nets[J].Information and Software Technology,2007,49(1):32-43
    [71]胡玉鹏，陈治平，林亚平，李军义．贝叶斯缺陷分析模型及其在软件测试中的应用[J]．计算机应用，2005，25(4)：808-810
    [72] M. Xie. Software Reliability Modelling[M]. Singapore: Word ScientificPublishing Co. Pte. Ltd.,1991:268-272
    [73] Z. Jelinski, P. Moranda. Software reliability research. In: Freiberger W,ed.Statistical Computer Performance Evaluation[M]. New York: Academic Press,1972:465-484
    [74] S. Yamada, M. Ohba, S. Osaki. S-Shaped reliability growth modeling forsoftware error detection[J]. IEEE Trans On Reliability,1983, R-32(5):475-478
    [75] Yamada Shigeru, Ohba Mitsuru, Osaki Shunji. S-Shaped Software ReliabilityGrowth Models and Their Applications[J]. IEEE Transactions on Reliability,1984,R-33(4):289-292
    [76]于波，姜艳．软件质量管理实践[M]．北京：电子工业出版社，2008：2
    [77] Stephen H. Kan著，吴明辉等译．软件质量工程-度量和模型[M]．电子工业出版社，2004：135-136
    [78]聂林波．软件缺陷分类的研究[J]．计算机应用研究，2004，6：84-86
    [79] Hamish Coatesa. Excellent measures precede measures of excellence[M].Australian Council for Educational Research,2007:315-317
    [80]王立福，张世琨，朱冰．软件工程——技术、方法和环境[M]．北京：北京大学出版社，1997：25-27
    [81]胡冠林，汪厚样．软件缺陷分类及其度量技术研究[J]．舰船电子工程，2005，3：55-58
    [82] Gillies, Alan. Software quality: Theory and management[M]. Chapman&Hall,1993:66-67
    [83] Boehm, W. Barry. Software Engineering Economics[J]. IEEE Transactions onSoftware Engineering,1984, SE-10(1):4-21
    [84] Harlan D. Mills, Richard C. Linger. Cleanroom Software Engineering:Developing Software Under Statistical Quality Control[M]. John Wiley&Sons, Inc.2002:113-114
    [85] Boehm, W. Barry. Characteristics of software quality[M]. North-Holland Pub.Co.1978:96-100
    [86] B. W. Boehm, J. R. Brown, M. Lipow. Quantitative evaluation of softwarequality[C]. In Proceedings of the2nd international conference on Softwareengineering (ICSE '76), IEEE Computer Society Press, Los Alamitos, CA, USA,1976:592-605
    [87] S. Lessmann, B. Baesens, C. Mues, S. Pietsch. Benchmarking ClassificationModels for Software Defect Prediction: A Proposed Framework and NovelFindings[J]. IEEE Transactions on Software Engineering,2008,34(4):485-496
    [88] R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray,M.Y. Wong. Orthogonal defect classification-a concept for in-processmeasurements[J]. IEEE Transactions on Software Engineering,1992,18(11):943-956
    [89]刘海，郝克刚．软件缺陷数据的分析方法及其实现[J]．计算机科学，2008，8：262-264
    [90] D.N. Card. Managing software quality with defects[C]. Computer Softwareand Applications Conference, COMPSAC2002, Proceedings26th AnnualInternational,2002:472-474
    [91] Lawrence H. Putnam, Ware Myers. Measures for Excellence: ReliableSoftware on Time, within Budget[M]. Prentice Hall Professional TechnicalReference,1991:389-391
    [92] Usama Fayyad, Gregory Piatetsky-shapiro, Padhraic Smyth. From Data Miningto Knowledge Discovery in Databases[J]. AI Magazine,1996,17(3):37-54
    [93] Usama M. Fayyad, Gregory Piatetsky Shapiro. Advances in KnowledgeDiscovery and Data Mining[M]. Conference, PAKDD Hong Kong, China,Proceedings,2001:365-367
    [94] Michael Berry, Gordon Linoff. Mastering Data Mining: The Art and Science ofCustomer Relationship Management[M]. John Wiley&Sons, Inc. New York, NY,USA,1999:238-241
    [95] Alex Berson, Stephen J. Smith. Building Data Mining Applications forCRM[M]. McGraw-Hill, Inc. New York, NY, USA,2002:99-102
    [96] Fabrizio Sebastiani. Machine learning in automated text categorization[C].ACM Comput, Surv,2002,34(1):1-47
    [97] Ming-Syan Chen; Jiawei Han; P.S. Yu. Data mining: an overview from adatabase perspective[J]. IEEE Transactions on Knowledge and Data Engineering,1996,8(6):866-883
    [98] Dayne Freitag. Machine Learning for Information Extraction in InformalDomains[J]. Machine Learning,2000,39, Numbers2-3:169-202
    [99] R. Agrawal, R. Srikant. Fast algorithms for mining association rules[C]. Proc.20th Int. Conf. Very Large Data Bases,1994
    [100] D. Heckerman. A tutorial on learning with Bayesian networks[M]. In: JordanMI, ed. Learning in Graphical Models, Cambridge: MIT Press,1998:226-228
    [101] Lise. Getoor, Nir. Friedman; Daphne. Koller, Benjamin. Taskar. Learningprobabilistic models of relational structure[C]. the Eighteenth InternationalConference on Machine Learning,2001
    [102] E.F. Codd. Relational completeness of data base sublanguages[M]. IBMResearch Laboratory, San Jose, California,1972:14-16
    [103] Debabrata Dey, Sumit Sarkar. A probabilistic relational model and algebra[J].ACM Trans. Database System,1996,21(3):339-369
    [104]王珊，陈红．数据库系统原理教程[M]．北京：清华大学出版社，2002：22-26
    [105] David Heckerman, Michael P. Wellman. Bayesian networks[C]. Commun.ACM,1995,38(3):27-30
    [106] Wray Buntine. Theory refinement on Bayesian networks[C]. In Proceedingsof the Seventh conference on Uncertainty in Artificial Intelligence (UAI'91), Bruce D.D'Ambrosio, Philippe Smets, and Piero P. Bonissone (Eds.). Morgan KaufmannPublishers Inc., San Francisco, CA, USA,1991:52-60
    [107] Nir Friedman, Moises Goldszmidt. Learning Bayesian networks with localstructure[C]. In Proceedings of the Twelfth international conference on Uncertainty inartificial intelligence (UAI'96), Eric Horvitz and Finn Jensen (Eds.). MorganKaufmann Publishers Inc., San Francisco, CA, USA,1996:252-262
    [108] Daphne Koller, Avi Pfeffer. Object-oriented Bayesian networks[C]. InProceedings of the Thirteenth conference on Uncertainty in artificial intelligence(UAI'97), Dan Geiger and Prakash Pundalik Shenoy (Eds.). Morgan KaufmannPublishers Inc., San Francisco, CA, USA,1997:302-313
    [109]徐益．基于似然关系模型的个性化推荐研究[D]：[硕士学位论文]，吉林：吉林大学计算机系，2008．
    [110] Lise Getoor. Multi-relational data mining using probabilistic relationalmodels: research summary[C]. In Proceedings of the First Workshop inMulti-relational Data Mining,2001
    [111] Ben Taskar, Pieter Abbeel, Daphne Koller. Discriminative probabilisticmodels for relational data[C]. In Proceedings of the Eighteenth conference onUncertainty in artificial intelligence (UAI'02), Adnan Darwiche and Nir Friedman(Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,2002:485-492
    [112]张凌晓，张菊艳，刘克成．基于修正的PRM网进行并行程序性能预测的方法[J]．计算机工程与科学，2010：154-156
    [113] Manfred Jaeger. On the complexity of inference about probabilistic relationalmodels[J]. Artificial Intelligence,2000,117(2):297–308
    [114] L. Getoor, N. Friedman, D. Koller. Probabilistic models of relationalstructure[J]. MACHINE LEARNING,2001
    [115] J. Suzuki. Learning Bayesian belief networks based on the MDL principle:An efficient algorithm using the branch and bound technique[J]. on information andsystem E series D,1996
    [116] Norbert Fuhr, Thomas Rlleke. A probabilistic relational algebra for theintegration of information retrieval and database systems[C]. ACM Trans. Inf. Syst.1997,15(1):32-66
    [117] Lise Getoor, Lilyana Mihalkova. Exploiting statistical and relationalinformation on the web and in social media[C]. In Proceedings of the fourth ACMinternational conference on Web search and data mining (WSDM '11), ACM, NewYork, NY, USA,2011:9-10
    [118] Lise. Getoor, Eran. Segal, Ben. Taskar, Daphne. Koller. Probabilistic modelsof text and link structure for hypertext classification[C]. Workshop Notes ofWorkshop on Text Learning: Beyond Supervision,2001
    [119] H. William, Hsu. Roby Joehanes. Relational decision networks[J]. SRL:Statistical Relational Learning and its Connections to Other Fields,2004
    [120]李小琳．一种从不完备关系数据中学习PRM的方法[J]．软件学报，2008，19(1)：73-81
    [121]李俊丽．一种改进的概率关系模型及其应用研究[J]．中北大学学报(自然科学版)，2011，32(2)：169-173
    [122] Lise Getoor, Mehran Sahami. Using Probabilistic Relational Models forCollaborative Filtering[C]. In Workshop on Web Usage Analysis and User Profiling,1999
    [123]范敏，石为人．基于PRM的水体富营养化风险分析建模[J]．计算机工程，2010，36(24)：261-263
    [124] W.N. Qian, A. Zhou. Analyzing popular clustering algorithms from differentviewpoints[J]. Journal of software,2002,13(8):1382-1394
    [125] A. K. Jain, M. N. Murty, P. J. Flynn. Data clustering: a review[C]. ACMComput. Surv.1999,31(3):264-323
    [126] Anil K. Jain, Richard C. Dubes. Algorithms for clustering data[J].Technometrics. Prentice-Hall, Inc.1990,32(2):227-229
    [127] J MacQueen. Some methods for classification and analysis of multivariateobservations[M]. Proceedings of the fifth Berkeley symp.1967:233-236
    [128] L. Kaufman, P.J. Rousseeuw. Finding groups in data: an introduction tocluster analysis[M]. NEW York,1990:278-280
    [129] J.M. Lattin, J.D. Carroll, P.E. Green. Analyzing multivariate data[M]. SanDiego, CA, US: Harcourt Brace Jovanovich,2003:123-126
    [130] C.S. Peebles. Monothetic-divisive analysis of the Moundville burials: aninitial report[M]. Newsletter of Computer Archaeology,1975,7(1):1-15
    [131] Tian Zhang, Raghu Ramakrishnan, Miron Livny. BIRCH: an efficient dataclustering method for very large databases[C]. In SIGMOD '96: Proceedings of the1996ACM SIGMOD international conference on Management of data,1996:103-114
    [132] S. Guha, R. Rastogi, K. Shim. ROCK: a robust clustering algorithm forcategorical attributes[C].15th International Conference on Data Engineering,1999:512–521
    [133] G. Karypis, Eui-Hong Han; V. Kumar. Chameleon: hierarchical clusteringusing dynamic modeling[J]. Computer,1999,32(8):68-75
    [134] Martin Ester, Hans-Peter Kriegel, Jiirg Sander, Xiaowei Xu. A density-basedalgorithm for discovering clusters in large spatial databases with noise[C]. Proc. of2nd Intl. Conf. On Knowledge Discovery and Data Mining,1996:226-231
    [135] Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jrg Sander.OPTICS: ordering points to identify the clustering structure[C]. In Proceedings of the1999ACM SIGMOD international conference on Management of data (SIGMOD '99).ACM, New York, NY, USA,1999:49-60
    [136] Wei Wang, Jiong Yang, Richard Muntz. STING: A statistical informationgrid approach to spatial data mining[C]. Proceedings of the23rd InternationalConference on Very Large Data Bases,1997:1-18
    [137] R. Agrawal, J. Gehrke, D. Gunopulos. Automatic subspace clustering of highdimensional data for data mining applications[M]. SIGMOD '98Proceedings of the1998ACM SIGMOD international conference on Management of data,1998:312-314
    [138] Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, Jong SooPark. Fast algorithms for projected clustering[C]. In Proceedings of the1999ACMSIGMOD international conference on Management of data (SIGMOD '99). ACM,New York, NY, USA,1999:61-72

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700