用户名: 密码: 验证码:
Griden数据网格系统优化方案设计与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机科学的发展和应用的普及,人们在日常的生活和工作过程中对于计算机的依赖程度逐渐提高。除生物医学、人体成像、气象预测、地震预测、高能物理等领域能够产生海量的数据外,生活和娱乐所产生的数据量也在以TB级甚至是PB级为单位快速增长。因此,实现海量数据的有效管理和数据的快速访问已经成为当前迫不及待需要解决且极具挑战性的问题。数据网格以广域环境下海量、异构的数据资源为处理对象,结合高性能计算设施和大规模存储设备,实现了数据存储、数据传输、数据访问、副本管理、高性能数据处理等功能,为用户提供了一个数据管理与处理的基础设施。
     作为一种跨广域网络的分布式数据共享、管理与处理机制,数据网格所面临的一个严峻挑战就是如何有效减少复杂网络环境对系统性能所造成的负面影响。文章以Griden数据网格系统为研究背景,以提高系统的性能和服务质量为研究的出发点和落脚点,针对系统中的元信息服务、控制消息传递机制和数据本身传输机制所存在的不足,分别提出了优化方法:
     i.基于Griden系统的元信息服务,本文提出了一种单级预取-多级缓存的元数据预取策略和一种基于虚拟目录和历史访问记录的元数据预取算法,简称DHMP。GridSM模拟器模拟实验结果显示,优化后的元信息服务较之前在性能方面有了较大的提高。
     ii.基于系统控制消息传递机制,本文引入了一种“分而治之”的策略,即在域内与域间采用不同消息传递策略。域内更加注重传递效率,而域间则更加注重于资源的异构性。本文通过简单的理论分析,说明了该策略的可行性。
     iii.基于数据本身传输机制,本文引入了一种基于“拉”的数据传输策略,即通知数据接收方主动到数据发送方并行地获取待传输的数据。此外,本文还针对系统数据传输过程进行了简化,减少了参与传输节点的数量,使得节点之间数据传输更加直接,提高了数据传输的效率和网络带宽利用率。
With the development and application of computer science popularization, people are becoming more and more dependency on computers in their daily work and life. In addition to bio-medicine, human body imaging, weather forecasting, earthquake prediction, high-energy physics and other fields can generate massive amounts of data, the amount of data generated in general life are also in rapid growth based on TB or even PB grade-level units.Therefore, to achieve effective management of huge amount of data and quick access to those data evolved in have become challenging issues. Data Grid integrates with high-performance computing facilities and massive storage equipments. It realizes many data management functionalities, such as data storage, data access, data transport, and replica management. It is regarded as a novel infrastructure with justice, self-adaptability and inter-activity for massive data management and sharing.
     As a distributed data sharing, management and processing system, a serious challenge what data grid faced is how to effectively reduce the negative impact on system performance caused by the complexity of network environment. The researches in this article are based on Griden data grid system. To improve the performance and QoS of Griden, it proposes several optimization methods according to metadata service, control message passing mechanism and data transferring mechanism, as follows:
     i. It proposes a single-stage pre-fetching and multi-level caching meta-data prefetching strategies based on the meta-information service in Griden system, and a prefetching algorithm based on the history access recordes and virtual directory, called DHMP.The data retrieved from the simulation using GridSM show that the performance of meta-information service have been greatly improved.
     ii. It introduces a kind of dividing first than ruling severaltily strategy, witch uses different message transfer mechanisms within and between domains. We pay more attention to efficiency within domain, while the inter-domain is more focused on the heterogeneity of resources. A simple theoretical analysis shows that the strategy is feasible.
     iii. It introduces a data transfer strategy based on“pull”. In this strategy, the controller informs the data receiver to obtain data from the data sender parallelly. In addition, this article also simplifies the data transfer process of Griden, whtich can reduce the number of nodes involved in transmission, and allow data transfer between nodes more direct, accordingly improves the efficiency of data transmission and the utilization of network bandwidth.
引文
[1] I. Foster, C. Kesselman, S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 2001. 15(3): 200~222.
    [2] I. Foster, C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. 1999, San Francisco, C., USA: Morgan Kaufmann Publishers Inc.
    [3] I. Foster, C. Kesselman. The Grid 2: Blueprint for a New Computing Infrastructure. 2003, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
    [4] Chervenak, A., Foster, I.,Kesselman, C.,Salisbury,C. and Tuecke,S. The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 2001, 23(2):187~200.
    [5] Allcok, W. et al. Data management and transfer in high performance computational grid enviroment. Parallel Computing, 2002, 28(5):749~771.
    [6] Srikumar Venugopal, Rajkumar Buyya, Kotagiri Ramamohanarao. A taxonomy of Data Grids for distributed data sharing, management, and processing. ACM Computing Surveys(CSUR), 2006, 3(1):344~351.
    [7] Semantic Grid Community Portal. http://www.semanticgrid.org/, 2001
    [8] LHC Computing Grid Project. http://lcg.web.cern.ch/LCG/, 2002
    [9] Pearlman, L., Kesselman, C., Gullapalli, S., Spencer JR., B., Futrelle, J., Kathleen, R., Foster, I., Hubbard, P.,AND Severance, C. 2004. Distributed hybrid earthquake engineering experiments: Experiences with a ground-shaking Grid application. In Proceedings of the 13th IEEE Symposium on High Performance Distributed Computing (HPDC-13). Honolulu, HI. IEEE Press, Los Alamitos, CA.
    [10] B. Allcock, I. Foster, V. Nefedov, A. Chervenak, E. Deelman, C. Kesselman, J.Lee, A. Sim, A. Shoshani, B. Drach, and D. Williams. High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies. In Supercomputing 2001, Denver,Texas, 2001.11.
    [11] Ann L. Chervenak, Ewa Deelman, Carl Kesselman, et al. High-performance remote access to climate simulation data: a challenge problem for data grid technologies. Parallel Computing. 2003, 29(10): 1334~1356.
    [12] Butterfly Net. http://www.butterfly.net, 2009.
    [13] SSO. http://en.wikipedia.org/wiki/Single_sign-on, 2009.
    [14] OGSA Project. http://www.globus.org/ogsa/, 2002.
    [15] Globus Project. http://www.globus.org/toolkit/, 2003.
    [16] Open Grid Forum. http://www.ogf.org/, 2004.
    [17] EGEE Project. http://public.eu-egee.org/, 2004.
    [18] Storage Resource Broker. http://www.sdsc.edu/srb/index.php/Main_Page, 2006
    [19] China National Grid. http://www.cngrid.org/web/guest/home.
    [20] E-Science Programme. http://www.rcuk.ac.uk/escience/default.htm.
    [21] Foster I., Kesselman C., Tuecke S. The anatomy of the grid: Enabling scalable virtual organizations. Int’l Journal of Supercomputer Applications, 2001, 15(3): 200?222.
    [22] BIRN Project. http://www.nbirn.net, 2006.
    [23] BioGrid Project. http://www.biogird.jp, 2004.
    [24] B. Spencer, T.Finholt, I. Foster, C. kesselman, et al. NEEGrid: A Distributed Collaborat- ory for Advanced Earthquake Engineering Experiment and Simulation. Proceedings of the 13th World Conference on Earthquake Engineering. 2004.
    [25] Grid3 International. http://www.grid3.com/, 2007.
    [26] Belle Analysis Data Grid. http://www.conf.kek.jp/hepdg/04/1206.file/Glenn-2.pdf.
    [27] GridPP Project. http://www.gridpp.ac.uk/, 2004.
    [28] Earth System Grid Project. http://www.earthsystemgrid.org/, 2001.
    [29]肖侬,黄斌,付伟等.GridDaEn数据网格系统的设计与关键技术实现.北京:清华大学出版社,2003.
    [30] Foster I., Kesselman C., Tsudik G., Tuecke S. A security architecture for computational grids. In: Proc. of the 5th ACM Conf. on Computer and Communications Security. New York: ACM Press, 1998. 83?92.
    [31] Butler R., Engert D., Foster I., Kesselman C., Tuecke S., Volmer J., Welch V. A national-scale authentication infrastructure. IEEE Computer, 2000, 33(12): 60~66.
    [32]黄斌,肖侬,刘波等.网格环境中数据统一访问的设计与实现.计算机工程与科学,2005,27(3):13~17.
    [33]付伟.数据网格环境下基于服务质量感知的副本放置关键技术研究.长沙:国防科学技术大学,2008.
    [34] Understanding Metadata. http://www.niso.org/standards/resources/Understanding Metadata.pdf, 2002.
    [35] NAS. http://en.wikipedia.org/wiki/Network-attached_storage, 2009.
    [36] PVFS Project. http://www.parl.clemson.edu/pvfs/, 2006.
    [37] Lustre Project. http://www.lustre.org/, 2004.
    [38] Sanjay Ghemanwat, Howard Gobioff, and Shun~Tak Leung. The Google File System. SOSP’03, 2003.
    [39] PanFS Project. http://www.panasas.com/panfs.html, 2003.
    [40] Chilimbi TM., Hirzel M. Dynamic hot data stream prefetching for general-purpose programs. In: Proc. Of the ACM SIGPLAN 2002 Conf. on Programming Language Design and Implementation. New York: ACM Press, 2002, 199~209.
    [41] Wenisch TF., Somogyi S., Hardavellas N., Kim J., Ailamaki A., Falsafi B. Temporal streaming of shared memory. In: Proc. Of the 32nd Annual Int’l Symp. On Computer Architecture. Los Alamitos: IEEE computer Society, 2005, 222~233.
    [42] Yu SZ., Kobayashi H. A new prefetch cache scheme. In: Proc. Of the IEEE Global Telecommunication Conf, 2002, 350~355.
    [43] Jehan-Francois Paris, Ahmed Amer, Darrell D. E. Long. A Stochastic Approach to File Access Prediction. International workshop on Storage Network Architecture and Parallel I/Os, 2003, On page(s):2~9.
    [44] Amer, D. D. E. Long, J.F. Paris, and R. C. Burns. File Access Prediction with Adjustable Accuracy. Proc. 21st Int. Performance of Computers and Communication Conf., 2002, On page(s): 131~140.
    [45] Peng Gu, YifengZhu, Hong Jiang, Jun Wang. Nexus: A Novel Weighted-Graph-Based Prefetching Algorithm for Metadata Servers in Petabyte-Scale Storage Systems. International Symposium on Cluster Computing and the Grid, 2006.
    [46] Lin Lin, Xuemin, Hong Jiang, Yifeng Zhu.AMP:An Affinity~based Metadta Prefetching Scheme in Large-Scale Distributed Storage Systems.CCGRID’08, 2008, 459~466.
    [47] WSDL. http://www.w3.org/TR/wsdl, 2001
    [48] UDDI. http://uddi.org/pubs/uddi_v3.htm, 2004.
    [49] Apache Axis Project. http://ws.apache.org/axis/, 2005.
    [50] CodeHaus XFire Homgpage. http://xfire.codehaus.org/, 2006.
    [51] SMB. http://samba.anu.edu.au/cifs/docs/what-is-smb.html#What_Is_SMB,2002.
    [52] CIFS. http://www.samba.org/cifs/, 2009.
    [53] NFS. http://www.faqs.org/rfcs/rfc1094.html, 2003.
    [54] FTP. http://www.faqs.org/rfcs/rfc959.html, 2003.
    [55]黄斌,彭小宁,肖侬,刘波.数据网格环境中数据传输服务的研究与实现.计算机应用研究,2004(10):212~214.
    [56] Kerberos. http://web.mit.edu/Kerberos/, 2009.
    [57] B. Allcock, et al. Data Management and Transfer in High Performance Computational Grid Environments. Parallel Computing, 2002(5): 749~771.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700