用户名: 密码: 验证码:
道路运输信息系统的数据挖掘方法研究与应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
道路运输是我国综合运输最大的组成部分,道路运输信息系统对道路运输管理、服务和行业发展有着重要的意义。道路运输信息系统的数据挖掘是发现和利用道路运输数据内在知识,实现系统深层次应用的关键技术手段。本文从研究道路运输信息系统的模型架构等顶层设计和数据挖掘的需求出发,针对各种数据挖掘理论和方法的优势和不足,在关联规则方法、分类方法、综合优化分类方法、聚类方法等四个方面提出挖掘道路运输中各种知识的适用方法,并在实际应用系统中分别加以验证,最后在广东省道路运输信息系统中综合实现。主要的科研工作与取得的重要研究成果概括如下:
     一、研究道路运输信息系统的模型架构和数据仓库设计等数据挖掘基础理论,提出了数据类型、数据关系和数据仓库等基本设计,重点介绍典型数据集市例子——IC卡道路运输电子证件系统的设计。
     二、在对比分析经典关联规则算法Apriori和其优化算法Eclat之间的实质区别基础上,首次提出和证明了候选集以项目为前缀或后缀两种情况下能否剪枝计算的性质,然后结合云计算编程模式MapReduce提出一种更为优化的频繁集计算方法——并行NEclat方法,设计了两段Map函数和Reduce函数,实现剪枝的并行计算,最后用道路运输管理信息系统的车辆投入数据实例进行验证。
     三、研究分析分类数据挖掘的一般方法——基于距离的分类算法k-最临近方法、决策树和贝叶斯分类方法的优势和不足,分析其在道路运输信息系统数据挖掘的适用范围,提出应用方法,并应用道路运输信息系统中的从业人员管理数据进行实际验证。然后基于全省公交一卡通的应用,建立类似BP神经网络分类方法的跨区消费推算矩阵模型,根据实际应用来设置误差阀值和学习率等关键参数,通过训练实际的一卡通消费数据,得到跨区消费的推算矩阵,最后利用实际测试数据进行验证。
     四、在研究分类问题的一般描述理论的基础上提出分类数据挖掘问题的抽象模型,引入粗糙集理论来揭示这个模型的本质。然后结合关联规则的Apriori算法和粗糙集理论,分别从条件属性约简、规则的计算和规则的简化等环节提出一系列方法,实现关联知识和分类知识挖掘的优化。首次提出利用粗糙集方法来得到规则条数与支持度、置信度的关系。最后以道路运输信息系统中的质量信誉考核和燃油限值的实例问题来检验这套方法。
     五、针对典型的基于密度的聚类算法——DBSCAN算法的不足,提出并证明了属性维划分和簇合并原理,最后结合三个原理提出基于MapReduce的优化DBSCAN算法,设计簇合并的Map函数和Reduce函数,实现并行计算,同时对比分析新旧算法的执行效率,并在实际的卫星定位应用例子加以验证。
     六、从构建广东省道路运输信息系统的业务、应用、数据和技术架构模型出发,重点论述数据类型和特征、数据关系和数据库规划,在此基础上研究全面分析数据挖掘的需求,提出总体解决思路,利用先进的建模分析工具Cognos在广东省道路运输信息系统的卫星定位数据管理子系统综合实现数据挖掘的全过程。
Road transport is the largest component of China's comprehensive transportation, theroad transport information system plays an important role on the road transport management,service and industry development. Data mining of road transport information system is thekey technique means to apply the deeply use of system and data. Starting from the study ofroad transport information system model framework, top-level design and data mining needs,for a variety of data mining theory and the advantages and disadvantages, in statisticalanalysis, association rules, classification, classification, clustering optimization five aspectsthe dissertation has proposed mining method for various kinds of knowledge in road transportand verified in the practical application system. Important findings of research work in thisdissertation include:
     1. This dissertation has researched basic question of data mining of road transportinformation system like model architecture and data warehouse and so on,proposed design ofdata types, data relation and data warehouse, focusing on design and application of typicaldata mart--IC card road transportation electronic certificate system.
     2. Compared with the conventional association rule algorithm Apriori and itsoptimization algorithm based on difference between Eclat, this dissertation first put forwardand proved two properties on which candidate set can be prune calculated based on prefixproject or suffix project, and then combined with cloud computing programming mode ofMapReduce, put forward a more optimized frequent set calculation method--Parallel NEclat,designed two section of Map function and Reduce function, parallel computing for pruning,finally all method were verified by the management information system of road transportationvehicle input data examples.
     3. This dissertation studied and analyzed strengths and weaknesses of the general datamining method of the classification. They are k-Nearest Neighbors method, decision treeclassification method and Bias's classification method. And it analyzed the scope of datamining in road transportation information system, presented the application of methods, andverified it in the actual application of personnel management data of road transportationinformation system. Based on the application of the public traffic pass intelligent card system,the matrix model of consumption cross the district which like a BP neural networkclassification method was built, according to the actual application set the key parameterserror threshold and learning rate, and then through the training of actual card consumer data,obtained matrix of consumption cross the district, finally verified by actual test data.
     4. Abstract model was proposed based on the general classification data mining problemdescription of theoretical research on the classification problem, the essence of classificationtheory was revealed by the rough set models. Based on combination of Apriori algorithm ofassociation rules and rough set, a series of methods on attribute reduction, the rule calculationprocess conditions, and the simplified rule were put forward, has optimized the associationand classification knowledge mining. First this dissertation proposed the method on usingrough set to get the relationship of number of rules and the degree of support confidence.Finally, applied quality credibility evaluation and fuel limit value problem instances in roadtransportation information system to test this method.
     5.Aiming at the typical density based clustering algorithm--DBSCAN algorithmshortcomings, proposed and proved principle of dimension attribute partition and clustering,and finally put forward the optimization of DBSCAN algorithm based on MapReducecombined with the three principles, this dissertation designed Map function and Reducefunction of cluster merging, and applied parallel computing, on the other hand analyzed theefficiency comparison of the algorithm, and verified it in the example of the practicalapplication of satellite positioning.
     6.Starting from the construction of Guangdong Province road transportation informationsystem business, application, data and technical architecture model, with emphasis on datatypes and features, data relationship and database programming, this dissertation proposed acomprehensive analysis of the demand of data mining, put forward ideas to solve the overall.Finally it applied the whole process of implementation of data mining in the satellitepositioning data management subsystem of GuangdongProvince road transportationinformation system by using advanced model comprehensive analysis tool which name isCognos.
引文
[1]史其信,郑为中.智能交通系统(ITS)共用信息平台构架及解决方案初步分析[J].交通运输工程与信息,2003,1(1):41-47
    [2]史忠值.知识发现(第一版)[M].北京:清华大学出版社,2002:2-3
    [3]毛国君,段立娟,王实,石云.数据挖掘原理与算法(第二版)[M].北京:清华大学出版社,2007
    [4] William H. Inmon Building the Data Warehouse[M]. USA: John Wiley&Sons,1992
    [5]商岳,沈祥玖,马高岭.山东省道路运输数据信息规划与数据仓库研究[J].山东交通学院学报,200614(2):10-13
    [6]赵勇.数据仓库技术在道路运政管理系统中的应用[J].计算机与数字工程,200432(5):73-79
    [7]黄雯.数据挖掘算法及其应用研究[D]..南京:南京邮电大学,2013
    [8]陈波,董鹏,邵勇.基于Apriori算法及其改进算法综述[R].南京,中国:中国通信学会第五届学术年会,2008
    [9]张丽娟,李舟军.分类方法的新发展:研究综述[J].计算机科学.2006,34(10):11-15
    [10] Cendrow S. J. PRISM: an algorithm f or inducing modular rules[J].International Journal of Man-Machine Studies,1987,27:349-370
    [11] Wang H, Dubitzky W, Dntsch I, et al. A lattice machine approach to automatedcase base design: Marrying lazy and eager learning[C]. In: Proc. IJCAI99,Stockholm, Sweden,1999.254-259
    [12] Schohn G, Cohn D. Less. Active learning with support vector machines[C].In: Proc.17th Int C onf. Machine Learning, Stanford,C A,USA,2000.120-128
    [13] Sousa T, Silva A, Neves A. Particle Swarm Optimisation as a New Tool f orData Mining[R]. Nice, France: International Parallel and DistributedProcessing Symposium(IPDPS),2003
    [14]周涛,陆惠玲.数据挖掘中聚类算法研究进展[J].计算机工程与应用,2012,48(12):100-110
    [15] Song Q, Ni J, Wang G. A. fast clustering-based feature subset selection algorithm for high dimensional data[J]. IEEE Transaction on Knowledgeand Data Engineering,2011(99):2477-2491
    [16]李春生,王耀南.聚类中心初始化的新方法[J].控制理论与应用,2010,27(10):1435-1440
    [17] Xie Zong-bo, Feng Jiu-chao. A sparse projection clustering algorithm[J].Journal of Electronices,2009,26(4):549-551
    [18]巩敦卫,蒋余庆,张勇,周勇.基于微粒群优化聚类数目的K-均值算法[J].控制理论与应用,2009,26(10):1175-1179
    [19]王守强,朱大铭,史士英.基于最小聚类划分的K-means聚类(1+ε)近似算法[J].计算机研究与发展,2008,45(Suppl.):26-30
    [20]周红芳,王鹏. DBSCAN算法中参数自适应确定方法的研究[J].西安理工大学学报,2012,28(3):289-292
    [21]夏鲁宁,荆继武. SA-DBSCAN:一种自适应基于密度聚类算法[J].中国科学院研究生院学报,2009,26(7):530-538
    [22]谭建豪,章兢,李伟雄.密度分布函数在聚类算法中的应用[J].控制理论与应用,2011,28(12):1791-1796
    [23]毕方明,王为奎,陈龙.基于空间密度的群以噪声发现聚类算法研究[J].南京大学学报(自然科学),2012,48(4):491-498
    [24] NASIBOV E N, ULUTAGAY G. Robustness of density-based clustering methodswith various neighborhood relations[J]. Fuzzy Sets and Systems,2009,160(24):3601–3615
    [25]周水庚,周傲英,曹晶.基于数据分区的DBSCAN算法[J].计算机研究与发展,2000,37(10):1153-1159
    [26]熊忠阳,孙思,张玉芳,王秀琼.一种基于划分的不同参数值的DBSCAN算法[J].计算机工程与设计,2005,26(9):2319-2321
    [27] Pawlak Z. Rough sets[J]..International Journal of Parallel Programming,1982,1(5):341-356
    [28]王国胤,姚一豫,于洪.粗糙集理论与应用研究综述[J].计算机学报,2009,32(7):1229-1246
    [29]王学恩,韩崇昭,韩德强,等.粗糙集研究综述[J].控制工程,2013,20(1):1-8
    [30] Zhu Feng, He Hua-Can. The axiomatization of the rough set[J]. ChineseJournal of Computers,2000,23(3):330-333
    [31]王国胤. Rough集理论在不完备信息系统中的扩充[J].计算机研究与发展,2002,39(10):1238-1243
    [32]张文修,魏玲,祁学军.概念格的属性约简理论与方法.中国科学E辑:信息科学,2005,35(6):628-639
    [33]胡峰,王国胤.属性序下的快速约简算法[J].计算机学报,2007,30(8):1429-1435
    [34]庞发虎,庞振凌,杜瑞卿.粗糙集理论对湖泊生态系统健康评定指数法的评价[J].生物数学学报,2008,23(2):337-334
    [35]倪永成,杨建国,吕志军.基于Rough Set理论对原棉纱线强度的规则提取[J].纺纱科技进展,2006.6:65-66
    [36]基于粗糙集的关联规则在高校人力资源管理中的应用研究[D],南京:江苏科技大学,2008
    [37]彭理群,吴超仲,黄珍.基于变精度粗糙集的汽车碰撞危险态势评估[J].2013,13(5):120-126
    [38] Rudy S, Bart B, Christophe M. A note on knowledge discovery using neuralnetworks and its application to credit card screening[J]. European Journalof Operational Research2009,192(1):326-332
    [39] Mehmed Kantardzic.数据挖掘-概念、模型、方法和算法[M].闪四清等译.北京:清华大学出版社,2001:10-12
    [40]李红,杨剑锋.基于改进的BP神经网络模型参考自适应控制[J].兰州交通大学学报,2011,30(1):37-41
    [41]柴毅,尹宏鹏,李大杰,等.基于改进遗传算法的BP神经网络自适应优化设计[J].重庆大学学报,2007,30(4):91-96
    [42]蒋雪,王顶.基于BP神经网络技术的GDP预测方法[J].经济论坛,2010(3):203-204
    [43]高朋,黄世祥.地区现代化水平BP人工神经网络评价方法初探[J].安徽农业大学学报.2006,15(3):27-30
    [44]郝艳,李秉祥.基于BP神经网络的经理管理防御程度测评模型[J].科技管理研究,2010,30(9):212-215
    [45]肖国荣. BP神经网络在基金价格预测中的应用研究[J].计算机仿真,2011,28(3):373-375
    [46]王亚琴.道路交通流数据挖掘研究[D].上海:复旦大学,2007
    [47]覃明贵.城市道路交通数据挖掘研究与应用[D].上海:复旦大学,2010
    [48] Shuyan Chen, Wei Wang, Henk V. Z. A comparison of outlier detectionalgorithms for ITS data[J]. Expert Systems with Applications,2010,37(2):1169-1178
    [49] Wong Y. K., W. L. Woon. An iterative approach to enhanced traffic signaloptimization[J]. Expert Systems with Applications,2008,34(5):2885–2890
    [50]齐志宏,熊桂喜.关联规则发现在ITS中的分析与实现[J].微计算机信息,200824(3):152-154
    [51]夏英,张俊,王国胤.时空关联规则挖掘算法及其在ITS中的应用[J].计算机科学,2011,38(9):173-176
    [52]夏英.智能交通系统中的时空数据分析关键技术研究[D].成都:西南交通大学,2012
    [53]孙锋,王殿海,马东方,等. Threshold values of traffic flow for theprovision of exclusive bus lanes[J]. Journal of Beijing Institute ofTechnology,2013,22(3):342-349
    [54]陈淑燕,王炜,瞿高峰.服务于智能交通系统的离群交通数据识别[J].东南大学学报(自然科学版),2008,35(4):723-726
    [55]李慧兵,杨晓光.面向间断流行程时间预测的浮动车数据挖掘[J].计算机工程与应用,2012,32(48):9-13
    [56] Zheng Fangfang, van Zuylen Henk. Comparison of urban link travel time estimationmodels based on probe vehicle data[J].Proceedings of Traffic and TransportationStudies,2010,5(32):134-143
    [57] Guoqing Zhou, Linbing Wang, Dong Wang, et al. Integration of GIS and DataMining Technology to Enhance the Pavement Management Decision Making[J].Journal of Transportation Engineering,2010,136(4):332-341
    [58]张昕,关志超,杨东援.基于多目标数据挖掘的城市交通仿真算法研究[J].中山大学学报(自然科学版),2007, S2(1):210-214
    [59]闫伟,刘云岗,王桂华,等.基于数据挖掘的交通流预测模型[J].系统工程理论与实践,2010,30(7):1320-1325
    [60]刘永红,廖瀚博,余志,等.基于环境影响的交叉口控制方式综合评估研究[J].中山大学学报(自然科学版),2013,52(1):12-16
    [61] Stephen D. Clark. The determinants of car ownership in England and Walesfrom anonymous2001census data[J]. Transportation Research Part C,20097(4):526-540
    [62] Lee W. H, Tseng S.S, Tsai S. H. A knowledge based real-time travel timeprediction system for urban network[J]. Expert Systems with Applications,2009,36(3):4239-4247
    [63] ZHAO Xiu-li, XU Wei-xiang. Mining Spatio-Temporal Association Rules inBus IC Card Databases[R] China:2nd International Conference on PowerElectronics and Intelligent Transportation System,2009
    [64]俞洁,杨晓光.基于改进BP神经网络的公交线路OD矩阵推算方法[J].系统工程,2006,24(4):89-92
    [65]姜平,石琴,陈无畏,等.公交客流预测的神经网络模型[J].武汉理工大学学报,2009,33(3):414-417
    [66] Cheng Jia. Applicable Research On SQLSERVER2005Data Mining Technology InLogistics Information Systems[R] China:The Eighth International Conferenceof Chinese Logistics and Transportation Professionals,2008
    [67] Dihua Sun, Liang Tang, Hongwei Wu1, Qiang Zhang. Analyzing OverspeedRegularities of Commercial Vehicles Based on Data Mining[R]. China:The7thWorld Congress on Intelligent Control and Automation,2008
    [68]唐亮.信息化条件下营运车辆安全监管关键技术研究[D].重庆:重庆大学,2012
    [69] Rui Tian, Zhaosheng Yang, Maolei Zhang. Method of Road Traffic AccidentsCauses Analysis Based on Data Mining[R]. China:2010InternationalConference on Computational Intelligence and Software Engineering (CiSE),2009
    [70] Wei Cheng, Xiaofeng Ji, Chunhua Han, Jianfeng Xi. The Mining Method of theRoad Traffic Illegal Data Based on Rough Sets and Association Rules[R],China:2010International Conference on Intelligent ComputationTechnology and Automation,2010
    [71] Xiao Juan, Ye Feng, Xie Yafen, el at. Association Rule Mining and Applicationin Intelligent Transportation System[R]. China:7th Chinese ControlConference,2008
    [72] He Song-bai, Wang Ya-jun, Sun Yue-kun, el at. The Research ofMultidimensional Association Rule in Traffic Accidents[R]. China:4thInternational Conference on Wireless Communications, etworking and MobileComputing,2008
    [73]曾婵娟.基于聚类分析及OLAP的营运车辆超速规律分析研究[D].重庆:重庆大学,2008
    [74]刘卫宁,曾婵娟,孙棣华.基于DBSCAN算法的营运车辆超速点聚类分析[J].计算机工程,2009,35(5):268-272
    [75]胡继华,程智锋,詹承志,等.适于营运车辆超速时空特征分析的改进DBSCAN算法[J].交通标准化,2009,12:59-63
    [76]郑劲松.基于数据仓库的城市轨道交通客流分析系统研究[D].湖南:中南大学,2009
    [77]郑超.基于数据仓库的长沙市交通信息处理系统[D].四川:电子科技大学,2009
    [78]王俊.基于空间数据仓库的城市交通规划研究[J].西北大学学报(自然科学版),2000,30(3):201-205
    [79]谈晓洁,周晶,盛昭瀚.基于数据仓库的城市交通拥堵疏导数据管理[J].东南大学学报(自然科学版),2003,33(1):1-4
    [80]朱艳平.大湄公河次区域公路口岸跨境运输数据仓库构建研究[D].辽宁:大连海事大学,2010
    [81]肖晴.基于数据仓库的高速公路信息管理系统的研究[D].湖南:长沙理工大学,2010
    [82]陈希平,张亮,李春祥,等.数据仓库及其在公交决策支持系统中的应用[J].甘肃工业大学学报,2001,27(4):27-33
    [83]黄智.数据仓库在我国高速公路运输量统计分析中的应用研究[D].陕西:长安大学,2012
    [84]李佑钢.治理车辆超限运输空间数据仓库的研究与构建[D].北京:首都师范大学2007
    [85]周海淞,朱茵,陆化普.支撑交通管理综合信息平台的信息挖掘模型[J].交通运输工程与信息学报,2005,3(2):27-33
    [86]石建军,李晓莉.交通信息云计算及其应用研究[J].交通运输系统工程与信息,2011,11(1):180-184
    [87]马庆禄,斯海林,郭建伟.物联网环境下城市交通区域联动的云控制策略[J].计算机应用研究,2011,30(9):2711-2714
    [88]钱哨,张云鹏,黄少波.智能交通云:基于云计算的智能交通系统[J].计算机与现代化,2010,11:168-171
    [89]张志远.基于“云计算”的智能交通系统研究与构建[D].甘肃:西北师范大学,2011
    [90]唐小淋,林培群,徐建闽.基于云计算和WSN的车联网体系架构及关键技术研究[J]交通信息与安全,2011,5:106-110
    [91]张丽.基于云平台的短时交通流预测算法设计与实现[D].辽宁:大连理工大学,2013
    [92]郑苏杭.面向海量交通信息流的分布式序列模式挖掘研究[D].浙江:杭州电子科技大学,2011
    [93] AGRAWAL R., SRIKANT R.. Fast Algorithms for mining association rules.Proceedings of20thInternational Conference on Very Large Data Bases[C].Santiago, Chile: Morgankaufman,1994:497-499
    [94] HAN J., PEI J., YIN Y.. Mining frequent patterns without candidategeneration. Proceedings of the2000ACM Data [C]. Dallas, UnitedStates:ACM,2000:1-12
    [95] FENG Pei-en, ZHANG Hui, QIU Qing-ying, et al. PCAR: an efficient approachfor mining association rules. Proceedings of the ICNC-FSKD2008International Conference on Fussy Systems and Knowledge Discovery [C].Jina: IEEE,2008:605-609
    [96] ZAKI M. J.. Scalable algorithms for association mining[J]. IEEETransactions on Knowledge and Data Engineering,2000,12(3):372-390
    [97]宋长新,马克.改进的Eclat数据挖掘算法的研究[J].微计算机信息,2008,24(8):92-94
    [98]冯培恩,刘屿,邱清盈,等.提高Eclat算法效率的策略[J].浙江大学学报(工学版),2013,47(2):221-230
    [99]张恺,郑晶.一种基于云计算的新的关联规则Apriori算法[J].甘肃联合大学学报(自然科学版),2012,26(6):61-76
    [100]杨新月.云计算环境下关联规则算法的研究[D].四川:电子科技大学,2011
    [101]洑云龙.云计算平台下的数据挖掘研究[D].江苏:南京邮电大学,2013
    [102]李伶俐.数据挖掘中分类算法综述[J].重庆师范大学学报(自然科学版),2011,38(4):45-47
    [103]陈鹏,严新平,李旭宏,等.基于可拓学的轨道交通与常规公交换乘收益分配[J].上海交通大学学报,2010,44(6):797-802
    [104]李龙澍,邹武.基于粗糙集的图像分类挖掘[J].计算机技术与发展,2009,19(4):143-145
    [105]尹世群,余建桥,葛继科,等.基于粗糙集的分类关联规则挖掘算法研究[J].计算机科学.2007,34(12):171-174
    [106] Skowron A, Rauszer C. The discemibility matrices and functions ininformation systems [M]. Slowinski R. Intelligent Decision Support Handbookof Applications and Advances of the Rough Sets Theory Dordrecht: KluwerAcademic Publisher,1992:331
    [107]徐云.粗集理论中等价关系的交与并运算[J].新疆大学学报(理工版),2001,18(1):16-21
    [108]邸书灵,陈娜,马新娜.回归分析在关联规则挖掘中的应用研究[J].微计算机信息,2008,24(1-3),171-172
    [109]陈燕,耿国华,郑建国.一种改进的基于密度的聚类算法[J].微机发展,2005,3(15):12–16
    [110]冯少荣,肖文俊.一种提高DBSCAN聚类算法质量的新方法[J].西安电子科技大学学报(自然科学版),2008,35(3):24–27
    [111]马帅.一种基于参考点和密度的快速聚类算法[J].软件学报,2003,11(6):34–37
    [112] XIE Yong-hong, MA Yan-hui, ZHOU Fang. PDBSCAN: Parallel DBSCAN for Large-Scale Clustering Applications[J]. Journal of Donghua University (Eng.Ed.),2012,29(1):76-79
    [113]赵卫中,马慧芳,傅燕翔,等.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学,2011,38(10):166-176
    [114]郗洋.基于云计算的并行聚类算法研究[D].南京:南京邮电大学,2011
    [115] Derya B., Alp K.. ST-DBSCAN: An algorithm for clusteringspatial–temporal data[J], Data&Knowledge Engineering,2007,60(2007):208-221
    [116]邓敏,刘启亮,王佳缪,等.时空聚类分析的普适性方法[J].中国科学:信息科学,2012,42(1):111-124
    [l17]常刚,张毅,姚丹亚.基于时空依赖性的区域路网短时交通流预测模型[J].清华大学学报(自然科学版),2013,53(2):215-221
    [118]马壮林,邵春福,胡大伟,等.高速公路交通事故起数时空分析模型[J].2012,12(2):93-99

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700