用户名: 密码: 验证码:
基于聚类算法的经济区划比较研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
宏观经济区划既是区域经济研究的传统课题之一,也是一个具有实践性的规划问题,是合理布局生产力与开展区域经济合作的重要依据。宏观经济区划工作在国家的经济发展中占有重要地位,一个科学合理的经济区划,可以分析经济现状,制定经济规划,促进经济发展。在有着长期计划经济传统和巨大区域差距的中国,尤其注重经济区划的工作。从五十年代开始,我国就出台了一系列的经济区划方案,这在我国计划经济体制下的经济发展发挥了积极地作用。随着我国经济的迅猛发展,越来越多的学者意识到数值分类方法划分经济区域的重要性。本文基于全国宏观经济统计数据,分别利用传统的层次聚类方法和目前性能相对突出的定性双聚类方法,对我国各省份经济区域划分方案以及各省份经济增长模式等问题做了深入的研究探讨和大胆的预测。研究结果表明,层次聚类算法对于全国经济区划的整体分析结果相对直观并且应用方便,双聚类算法的结果在挖掘各省在某些属性集合下的共性具有其独特的优势。
     本文首先概述了已有的经济区划的研究方法与结果,指出现有研究方法的局限性并在此基础上提出了本文的思路和方法。接下来本文对聚类分析方法和双聚类分析方法的基本原理和分类做了简要介绍,特别的,重点阐述了目前非常流行的一种双聚类算法-QUBIC的步骤和特点,并结合经济区划的特征对算法做了一定的改进。然后论文使用层次聚类方法和改进的定性双聚类方法对我国31个省市的1999-2007年间的宏观经济数据做了聚类分析,并对结果做了多方位的讨论。最后,通过对结果的分析讨论,给出了我国经济区划的方案并提出了一些政策建议。
     特别指出,本文在以下方面有所创新:
     1.本文针对宏观经济数据的自身特点,采用了两种降维的方法,结果中保留了数据中的年份特征。一种是将每年的各项指标加上年份标识作为新的指标,这样就将数据降到了新指标、省份的二维空间上。另外一种是将各年各省的数据加上年份标识作为新的省份,这样将数据降到了指标、新省份的二维空间上。这样的降维方法可以克服由于对某时间段对某项指标求平均值所带来的信息缺失。对于得出的两组数据,分别采用两种聚类方法并系统的对结果进行了比较研究,为经济区域划分领域的研究开拓了新的研究思路。
     2.一般的聚类方法是基于所有属性比较的聚类,用相似度量函数确定相似程度,从而将对象进行分类,而如果数据中的属性太多,聚类的对象并不一定在所有的属性下都有很好的相关性,从而导致分类结果的特异性降低。为突破一般聚类方法在此问题上的局限性,本文首次把双聚类方法应用到经济区划中来,结果表明,同一省份会出现在不同的双聚类当中,解决了传统聚类当中每个对象只能属于一个聚类的问题;原来聚类结果中不在一类的省份在双聚类结果中可能会被分到一起,说明有些省份虽然整体发展水平有差距,但是在某些属性上会表现出相似性。
     3.改进了QUBIC双聚类算法。考虑到经济数据与生物信息数据的差异性,针对经济数据的自身特点和算法的原理,作者改进了原有算法中的数据离散方法,从而使对象间的比较更加合理,提高了分析结果的实用价值。
     4.创新性的将热度图应用到宏观经济区划的分析中来,使得分析结果可视化,直观化。图中的树状图给出聚类对象之间的亲疏关系,由红、黑、绿三色构成的热度图直接显示了经济的好、中、差三个水平。
Macro-economic regionalization is not only one of the traditional problems of regional economic research, but also a practical planning problem. It is an important basis of distributing productivity rationally and developing regional economic cooperation. Macro-economic regionalization occupies an important position in the economic development of a country. A scientific and rational economic regionalization can help analyzing the status of the economy, formulating economic planning, and promoting economic development. Because of the long-time tradition of planned economy and huge regional disparity, China emphasizes on economic regionalization particularly. Since 1950's, our country has promulgated a series of economic regionalization projects which have played positive impact on economic development under the planned economy. Along with the swift development of Chinese economy, more and more scholars come to realize the importance of numerical classification in partition of economic regions. Based on macro-economic statistical data of our country, this article does a deep research and daring attempt on the plan of economic regionalization and the mode of economic development of each province using traditional hierarchical clustering method and one qualitative biclustering method whose function is outstanding at present.. Research results show that hierarchical clustering method can be conveniently used in regionalization research of the nation and the integral analytical results are comparatively intuitionistic. Biclustering method has the particular advantage of measuring the similarity between provinces under specific conditions.
     First, this article summarizes the economic regionalization methods and results of previous study. It points out the limitation of existing methods. Then it proposes the research thinking and method of this article. The following part summarizes classification and basic principles of clustering algorithm and biclustering algorithm, especially emphasizes on the steps and features of a very popular biclustering algorithm QUBIC. In addition, it makes some improvement to this algorithm concerning to the characters of economical regionalization. After that, this article uses hierarchical clustering algorithm and improved QUBIC biclustering algorithm to analyze the macro- economic data from 1999 to 2007 of the 31 provinces in China. Then it discusses the results in multiple aspects. Finally, according to the discussion of the results, the thesis makes a conclusion of the economic regionalization plan in China.
     Especially, some innovations have been made in this article:
     1. According to the features of macro-economic data, this article applies two dimension-reduction methods, whose results retain the temporal character of the data. One type is generating new index with adding the year mark to each annual index. In this way, the dimension is reduced to two dimension space of new index and provinces. The other type is generating new provinces with adding year mark to each index of each province. In this way, the dimension is reduced to two dimension space of index and new provinces. Then two clustering algorithms are applied separately to each group of data and comparative study has been made to the results systematically.
     2. Most of clustering methods measure similarities between objects according to the similarity measuring function in order to classify the objects. These methods are based on all conditions. But the objects being clustered may not have a good relationship under all the conditions if there are too many. This problem may cause the reduction of specificity of the classification results. In order to overcome the limitations on this problem of general clustering methods, this article applies biclustering algorithm to the economic regionalization for the first time. Results show that one province may appear in different biclusters which solves the problem of each object belonging to only one cluster in traditional clustering methods. Another progress is that provinces in different clusters in former clustering results may be assigned to one cluster in biclustering results. That is to say, although there is a disparity between the integral developing levels of different provinces, it may appear similarity in some properties.
     3. This article also improves QUBIC biclustering algorithm. Based on the characters of economic data and principles of the algorithm, the writer improves the decretion method, which makes the comparation between objects more reliable and greatly improved the practical values of the analyzing results.
     4. Heat map is applied to analysis of clustering results for the first time, which makes the analysis of results visualized. Tree map in heat map provides proximity between clustering objects using heat map composed of three colors of red, black and green representing three levels of good, middle and bad.
引文
[1]宋岭,魏秀丽.中国经济区域划分综述[J].新疆财经,2000年第2期:P47-49.
    [2]杨娟,王昌全,李冰,李启权,宋薇平.自组织竞争神经网络及其在社会经济区划中的应用[J].西南师范大学学报(自然科学版),第32卷第4期:98-103.
    [3]李善同,候永志.中国(大陆)区域社会经济发展特征分析[R].国务院发展研究中心内部资料,2002-12-31.
    [4]John R. P. Friedmann. The Concept of a Planning Region [J]. Land Economics, vol.32, no. 1(1956).
    [5]Kenneth P Johnson, John R Kort.2004 Redefinition of the BEA Economic Areas [J]. Survey of Current Business, vol.84, no.3 (2004).
    [6]Bongaerts D., Corvers F and M. Hensen, The Delimitation and Coherence of Functional and Administrative Regions [J]. Research Center for Education and the Labour Market,2004.
    [7]Braxton C. Davis. Regional planning in the US coastal zone:a comparative analysis of 15 special area plans [J]. Ocean & Coastal Management 47,2004:79-94.
    [8]Brett A. Bryan, Neville D. Crossman. Systematic regional planning for multiple objective natural resource management [J]. Journal of Environmental Management 88,2008:1175-1189.
    [9]Michael B. Richman a, Indra Adrianto. Classification and regionalization through kernel principal component analysis [J]. Physics and Chemistry of the Earth, xxx (2010):xxx-xxx.
    [10]刘东良,郑平建.基于多元分析的我国大经济区划初步研究[J].广西大学梧州分校学报,2000(4):14-18.
    [11]刘钦普.数理统计方法在河南地市经济发展水平和分区研究中的应用[J].数理统计与管理,2002(3):10-16.
    [12]刘征,张东云.基于模型的生态经济区划研究-以唐海县为例[J].经济与管理,2005(5):19-21.
    [13]郑德祥,陈平留,张连金.人工神经网络方法在林地经济区划中的应用[J].福建林学院学报,2006,26(3):206-209.
    [14]张燕文.基于空间聚类的区域经济差异分析方法[J].经济地理,2006(7):557-560.
    [15]姜玲,杨开忠.中国标准经济区划分方法研究[J].湖北社会科学(经济论坛):72-76.
    [16]陈爽英.中国区域城市循环经济发展的聚类实证分析[J].中国软科学,2007(10):118-126.
    [17]彭萍,胡桂开.江西省经济区划及分区发展研究[J].东华理工大学学报(社会科学版),2008(6):130-132.
    [18]李雪梅,张素琴.数据挖掘中聚类分析技术的应用[J].武汉大学学报(工学版),2009(6):396-399.
    [19]张永明.山东省生态经济类型区划分研究[J].宁夏大学学报(自然科学版),2009(6):189-192.
    [20]林爱文,牛继强,胡立峰.赋权共原点灰色聚类的区域自然资源评价研究[J].武汉大学学报·信息科学版,2008(2):164-167.
    [21]蔡祖华,李晔.基于灰色定权聚类的区域经济发展水平评价[J].河南科学,2009(6):740-742.
    [22]杜春丽,成金华.我国各地区循环经济发展水平软聚类分析研究[J].运筹与管理,2009(6):116-122.
    [23]Guojun Li, Qin Ma, Haibao Tang, Andrew H. Paterson and Ying Xu. QUBIC:a qualitative biclustering algorithm for analyses of gene expression data [J]. Nucleic Acids Research, 2009(6):1-10.
    [24]朱明.数据挖掘(第二版)[M].合肥:中国科学技术大学出版社.2008年.
    [25]T. Zhang, R. Ramakrishman, and M. Livny. BIRTH:an efficient data clustering method for very large databases. In proc.1996 ACM-SIGMOD Int. Conf. Management of data,pages 103-114, Montreal, Canada, June 1996.
    [26]S. Guha, R. Rastogi, and K. Shim. Cure:An efficient clustering algorithm for large database. In proc.1998 ACM-SIGMOD Int. Conf. Management of data, pages 73-84, Seattle, Washongton, June 1998.
    [27]M. Ester, H.-P. Kriegel, J. Sander and X.Xu. A density-based algorithm for discovering clusters in large spatial database. In proc.2nd Int. Conf. Knowledge discovery and data mining(KDD'99), pages 226-231. Portland Oregon, August 1996.
    [28]M. Ankerst, M. Breunig, H.-P. Kriegel, and J Sander. Optics:Ordering points to identify the clustering structure. In proc.1999 ACM-SIGMOD Int. Conf. Management of data, pages 49-60, Philadelphia, PA, June 1999.
    [29]W. Wang, J. Yang, and R. Montz. STING:A statistical information grid approach to spacial data mining. In proc.1997 Int. Conf. Very large data bases, pages 186-195, Athens, Greece, Aug.1997.
    [30]A. K. Jain and R. C. Dubes. Algorithms for clustering data. Printice hall,1998.
    [31]G. Shiekholeslami, S. Chatterjee, and A. Zhang. WaveCluster:A multi-resolution clustering approach for very large spacial databases. In proc.1998 Int. Conf. Very large data bases, pages 428-439, New York, NY, August 1998.
    [32]Sara C. Madeira and Arlindo L. Oliveira. Biclustering Algorithms for Biological Data Analysis: A Survey. INESC-ID TEC. REP.1/2004, JAN 2004:1-31.
    [33]Yizong Cheng, George M. Church. Biclustering of expression data [J]. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB'00), pages 93-103,2000.
    [34]Qizheng Sheng, Yves Moreau, Bart De Moor. Biclustering micrarray data by gibbs sampling[J]. In Bioinformatics, volume 19 (Suppl.2), pages ii196-ii205,2003.
    [35]Laura Lazzeroni, Art Owen. Plaid models for gene expression data [J]. Technical report, Stanford University,2000.
    [36]Jiong Yang, Wei Wang, Haixun Wang, Philip Yu.6-clusters:Capturing subspace correlation in a large data set [J]. In Proceedings of the 18th IEEE International Conference on Data Engineering, pages 517-528,2002.
    [37]Jiong Yang, Wei Wang, Haixun Wang, Philip Yu. Enhanced biclustering on expression data [J]. In Proceedings of the 3rd IEEE Conference on Bioinformatics and Bioengineering, pages 321-327,2003.
    [38]张可云.区域经济政策[M].北京:商务印书馆,2001.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700