基于对象的并行文件系统接口语义扩展研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于对象的并行文件系统接口语义扩展研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Interface Semantic Extension for Object-based Parallel File System
作者：涂旭东
论文级别：博士
学科专业名称：计算机系统结构
中文关键词：对象存储 ; 并行文件系统 ; 接口扩展 ; 并行计算框架 ; 纠删冗余编码
英文关键词：object-based storage ; parallel file system ; interface extension ; parallel computing framework ; asynchronous erasure code
学位年度：2011
导师：冯丹
学科代码：081201
学位授予单位：华中科技大学
论文提交日期：2011-05-01

摘要

随着计算机技术的迅猛发展,采用新型存储管理和高性能互联技术设计与构建数以千计节点的高效存储系统已成为可能。但如何在同样硬件配置下,发挥软件更高并行性,以解决随着存储需求的增长导致出现的系统性能瓶颈和扩展性问题始终是个难题。当前与大规模存储集群配套的是可扩展并行文件系统,如GPFS、PVFS、Ceph、Lustre和PanFS等,而且全球Top 500超级计算的存储方案基本上采用的都是上述这些系统。因此重点研究了高效并行文件系统的架构及其实现方法,为使所设计的并行文件系统更好地支持高性能计算,由并行文件系统与应用的接口层入手,还研究了：通过扩展接口语义来优化系统性能,以满足高性能计算的I/O需求；通过感知布局优化并行作业I/O的访问模式以更好适应数据密集型可扩展计算；并行文件系统对新型并行计算的支持方法；并行文件系统中冗余编码并行化问题等。
     设计并实现了一种基于对象的并行文件系统CapFS,具有如下特点：可定制的数据布局模式、基于对象的远程直接数据访问和具有事务性持久化存储管理。提出了一种嵌套RAID模式的统一层次化数据布局模型和算法,实现了客户端驱动的可定制数据布局,并保持POSIX语义的完整性。针对对象存储设备规范中存在的扁平名字空间服务和可扩展属性管理问题,提出了一种基于内核级微数据库管理引擎并结合文件系统的高效对象访问与持久化存储管理方法,实现了变长对象持久化存储和结构化属性的高效查询。提出一种基于对象存储协议和远程进程调用RPC的对象直接访问方法,可向存储客户端提供多网络设备共享访问模式,并提供独立数据表示层来保证对象存储节点的多协议协商。原型系统CapFS的参数设置和整体性能的测试结果表明该系统具有较好的性能和可扩展性。
     分析发现传统文件系统接口(如：POSIX)语义不能很好地支持高性能计算需求,而并行计算应用的I/O访问模式通常由访问大量小文件、不连续的数据块组成,而且强调文件并发访问、非邻接访问和高元数据率,同时要求I/O协同访问。为使存储系统的I/O模式更好地支持新型计算,采用对POSIX接口进行扩展的办法,从接口扩展和语义保持角度出发,提出了包括基于文件共享描述符的I/O并发优化、面向非连续I/O的接口支持、延迟与批量元数据操作优化和保持POSIX语义的布局控制等四种接口扩展方法。测试表明扩展后的接口较已有方法具有更好的性能。
     分析并行计算框架MapReduce中的I/O模式发现,传统并行计算框架存在着中间数据拷贝和通信代价过高的缺点。分析了传统分布式文件系统和本文实现的并行文件系统CapFS在支持并行计算框架的异同和优劣,提出了扩展CapFS参数化布局的I/O感知接口实现MapReduce计算的框架模型"MapReduce over CapFS"。I/O基准测试和实际数据密集型应用验证了该模型利用存储节点计算资源进行数据处理可有效降低中间数据规模和减少计算节点同存储系统之间的数据传输量。此外,计算密集、I/O密集与计算和I/O都密集的三类应用测试结果还表明该模式尤其对包含I/O密集的应用可提供更高的系统加速比。
     提出利用并行计算框架分析与设计系统纠删编码算法的方法,给出了一种基于MapReduce模型的冗余编码并行算法,提高了系统冗余编码效率,保障系统可靠性。基于该算法,在CapFS中实现了一个异步的冗余编码计算框架,可支持不同粒度的冗余配置,具体包括单个文件级策略、多用户多文件组策略和直接面向存储设备的对象级、对象分组和分区粒度的集成。通过编码损耗率模型分析了算法复杂度,并通过Yahoo提供的元数据Trace对系统时空复杂度进行了仿真试验,结果显示了按照文件、用户文件分组和分区对象集合三种逐步增大粒度的冗余计算对提高空间利用率的变化,表明可在数据可靠性和维护代价上自适应调整。在通常高性能计算规模下,因冗余计算由后端存储节点异步完成,且校验数据不占用应用I/O传输,并行纠删码冗余计算开销较低。
With the rapid development of information technology, recent advances in storage sys-tem technologies and high performance interconnects have made possible in the last years to build, more and more potent storage system that server thousands of nodes. However, software parallelism that can be more effectively exploited by the current hardware is the key issue for the emerging bottlenecks and system scalability due to the enhanced storage requirement. Currently, the majority of storage systems of clusters are managed by kinds of scalable parallel file systems such as, for example, GPFS, PVFS, Ceph, Lustre and PanFS etc. And those storage solutions have mostly been adopted by the list of World's Top 500 Supercomputers. So this dissertation mainly focuses on the architecture and realization methodology of a highly efficient parallel file system. As the aim of supporting high perfor-mance computing(HPC) well, next steps research includes interface semantics extension for the optimized performance to meet I/O requirement of HPC, then layout-aware approaches for optimizing parallel jobs'I/O pattern to adapt with data intensive scalable computing, cases study on the coupled problem on parallel file system and computing framework, and issues on redundancy coding in paralle file system etc.
     The design and implementation of a massive object-based parallel file system, named CapFS, bring several characters towards the proposed prototype. It has customized data distribution strategies, remote direct data access capability based upon object-based stor-age(OSD) protocol and power of persistent data management in transaction. In detail, the proposed nested-RAID scheme, as the uniformed model and algorithm of data layout, pro-vides a way to enable client-driven layout computation and maintains a consistent notion of a filc's layout that provides POSIX semantics without restricting concurrent access to the file. Given the flat namespace service and scalable attribute management in OSD profile, a kind of mini database manager in kernel combined with local file system was proposed to take care of highly efficient object-based access and persistence management, and be fit for the differential service between objects in variable size and well-structured attributes. The machanism of OSD over RPC offers clients direct object-based storage access towards the available and shared-everything OSDs, also supports protocol negotiation among multiple storage transfer semantics mismatch. The tunnable parameters and testing results of the whole system both verified the effect and good scalability.
     So much evidence and analysis that the traditional POSIX interface can not afford to support HPC parallel applications whose I/O access pattern often consist of acesses to a large number of small, non-contiguous pieces of data. Those parallel applications lead to interleaved file access patterns with high interprocesses spatial lacality at the I/O nodes and high metadata throughput. Extensions are needed so that high-concurrency and high-performance computing applications running on top of the rapid prototyping parallel file system could perform well. So four types of interface extensions were presented to make storage I/O semantics match the upper applications. There arc shared file descriptor for con-current I/O, non-contiguous I/O oriented optimization, lazy and bulk metadata operations and layout control based on keeping POSIX semantics. Those subset of POSIX I/O inter-faces were deployed on the clusterd and high-speed interconnected file system. In addition, experimental results on a set of micro-benchmarks confirm that the extentions to the popular interface greatly improve scalability and performance than traditional methods.
     From bottom-up perspective, and takes the popular parallel computing framework as example. It can be easily found that the drawback of serious mediate data copy and commu-nication cost are caused by the semantics mismatch between exsiting I/O model and parallel computing framework. Compared with the difference between traditional distributed file system, the proposed layout parameterized by I/O-aware information helps to implement MapReduce computing framework over CapFS. I/O benchmarks and real application test-ing demenstrates such kind of parallel computation could execute upon the above parallel file system, in which the parallel I/O with several optimized and locality-aware functional-ities could be more feasible and flexible to the requirement of shipping code to data than Hadoop distributed file system. Among the three kinds of applications including computa-tion intensive, I/O intensive and both intensive, the proposed scheme could improve much more speed-up ratio for I/O intensive applications.
     For another, towards top-down perspective, a kind of erasure code was implemented by the parallel computing framework to provide better reliability and availability. This solution enables asynchronous compression of initially triplicated data down to RAID-class redun-dancy overheads, and those algorithms implementation based on MapReduce framework. Based on the algorithm, CapFS has implemented a redundant data management framework, which supports redundancy in different level including inter- and extra files, multiple user groups and devices level. Quite contrary to most exsiting solutions, in which the parity data is created in client side and transported in bind from clients to servers, vice the versa. The proposed redundancy method suggests an asynchronous way and totally transparent to clients'runtime, parity computation and loss recovery could be also recognized as parallel processing procedures. The experimental results come from metadata trace of Yahoo clus-ters, and demonstrated the efficiency of proposed algorithm and framework respectively.

引文

[1]Gantz J F, Chute C, Manfrediz A, et al. The Diverse and Exploding Digital Universe. IDC White Paper, IDC. Http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf
    [2]Howe D, Costanzo M, Fey P, et al. Big data:The future of biocuration. Nature Magazine, Nature Publishing Group,2008,455(7209):47-50
    [3]Armbrust M, Fox A, Griffith R, et al. A view of cloud computing. Communications of the ACM, ACM,2010,53(4):50～58
    [4]Moore R W, Baru C, Marciano R, et al. The grid:Data-intensive computing. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc.,1999,105-129
    [5]Perkins L S, Andrews P, Panda D, et al. Data intensive computing. in:Proceedings of the 2006 ACM/IEEE conference on Supercomputing(SC'06), New York, NY, USA:ACM,2006,245～256
    [6]Salem K, Garcia-Molina H. Disk Striping. in:Proceedings of the Second International Conference on Data Engineering, Washington, DC, USA:IEEE Computer Society,1986,336～342
    [7]Gibson G A. Redundant Disk Arrays:Reliable, Parallel Secondary Storage:[PhD Dissertation]. EECS Department, University of California, Berkeley, December,1990
    [8]Ko M, Chadalapaka M, Hufferd J, et al. Internet Small Computer System Interface (iSCSI) Ex-tensions for Remote Direct Memory Access (RDMA). RFC 5046 (Proposed Standard), October, 2007. http://www.ietf.org/rfc/rfc5046.txt
    [9]Goglin B, Prylli L. Performance Analysis of Remote File System Access over High Bandwidth Local Network. in:Proceedings of 18th International Parallel and Distributed Processing Sympo-sium (IPDPS'04)-Workshop 8. IEEE,2004,185-190
    [10]Sandberg R, Golgberg D, Kleiman S, et al. Innovations in Internetworking. Norwood, MA, USA: Artech House, Inc.,1988,379～390
    [11]Howard J H, Kazar M L, Mcnccs S G, ct al. Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), ACM,1988,6(1):51-81
    [12]Bellovin S M, Merritt M. Limitations of the Kerberos authentication system. ACM SIGCOMM Computer Communication Review, ACM,1990,20(5):119～132
    [13]Satyanarayanan M. Coda:A Highly Available File System for a Distributed Workstation Environ-ment. IEEE Transactions on Computers,1990,39:447～459
    [14]Kazar M L, Leverett B W, Anderson O T, et al. DEcorum File System Architectural Overview. in: Proceedings of the USENIX Summer Technical Conference, Anaheim, CA:USENIX, June,1990, 151-164
    [15]Chutani S, Anderson O T, Kazar M L, et al. The Episode File System. in:Proceedings of the USENIX Winter Technical Conference, San Fransisco, CA, USA:USENIX,1992,43～60
    [16]Gray C, Cheriton D. Leases:an efficient fault-tolerant mechanism for distributed file cache con-sistency. in:Proceedings of the twelfth ACM symposium on Operating systems principles, New York, NY, USA:ACM,1989,202-210
    [17]Wilder D. Book Review:Samba:Integrating UNIX and Windows. Linux Journal, Specialized Systems Consultants, Inc.,1998,1998(50):1-9
    [18]Satyanarayanan M. A study of file sizes and functional lifetimes. in:Proceedings of the eighth ACM symposium on Operating systems principles, New York, NY, USA:ACM,1981,96-108
    [19]Ellard D, Ledlie J, Malkani P, et al. Passive NFS Tracing of Email and Research Workloads. in: Proceedings of 2003 USENIX Conference on File and Storage Technologies(FAST'03), Berkeley, CA, USA:USENIX Association, April,2003,203～216
    [20]Baker M G, Hartman J H, Kupfer M D, et al. Measurements of a distributed file system. in: Proceedings of the thirteenth ACM symposium on Operating systems principles, New York, NY, USA:ACM,1991,198～212
    [21]Ousterhout J K, Da Costa H, Harrison D, et al. A trace-driven analysis of the UNIX 4.2 BSD file system. in:Proceedings of the tenth ACM symposium on Operating systems principles, New York, NY, USA:ACM,1985,15～24
    [22]Fryxell B, Olson K, Ricker P, et al. FLASH:An Adaptive Mesh Hydrodynamics Code for Model-ing Astrophysical Thermonuclear Flashes. The Astrophysical Journal Supplement Series, ACM, 2000,131(1):273～274
    [23]Darling A E, Carey L, Feng W. The Design, Implementation, and Evaluation of mpiBLAST. in: Proceedings of the 4th International Conference on Linux Clusters:The HPC Revolution 2003 in conjunction with ClusterWorld Conference Expo, San Jose, CA:Linux Cluster Institute, June, 2003,1～6
    [24]Miller E L, Katz R H. Input/output behavior of supercomputing applications. in:Proceedings of the 1991 ACM/IEEE conference on Supercomputing, New York, NY, USA:ACM,1991,567～576
    [25]Hildebrand D, Nisar A, Haskin R. pNFS, POSIX, and MPI-IO:a tale of three semantics. in: Proceedings of the 4th Annual Workshop on Petascale Data Storage, New York, NY, USA:ACM, 2009,32-36
    [26]Wong A T, Oliker L, Kramer W T C, et al. ESP:a system utilization benchmark. in:Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), Washington, DC, USA:IEEE Computer Society,2000,15-16
    [27]Pasquale B K, Polyzos G C. Dynamic I/O characterization of I/O intensive scientific applica-tions. in:Proceedings of the 1994 conference on Supercomputing, Los Alamitos, CA, USA:IEEE Computer Society Press,1994,660～669
    [28]Kotz D, Nieuwejaar N. Dynamic file-access characteristics of a production parallel scientific workload. in:Proceedings of the 1994 conference on Supercomputing, Los Alamitos, CA, USA: IEEE Computer Society Press,1994,640-649
    [29]Purakayastha A, Ellis C S, Kotz D, et al. Characterizing parallel file-access patterns on a large-scale multiprocessor. in:Proceedings of the 9th International Symposium on Parallel Processing, Washington, DC, USA:IEEE Computer Society,1995,165-172
    [30]Nieuwejaar N, Kotz D, Purakayastha A, et al. File-Access Characteristics of Parallel Scien-tific Workloads. IEEE Transactions on Parallel and Distributed Systems, IEEE Press,1996, 7(10):1075～1089
    [31]Smirni E, Reed D A. Workload Characterization of Input/Output Intensive Parallel Applications. in:Proceedings of the 9th International Conference on Computer Performance Evaluation:Mod-elling Techniques and Tools, London, UK:Springer-Verlag,1997,169-180
    [32]Smirni E, Aydt R A, Chen A A, et al. I/O Requirements of Scientific Applications:An Evolu-tionary View. in:Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing, Washington, DC, USA:IEEE Computer Society,1996,49～58
    [33]Lofstead J, Zheng F, Klasky S, et al. Input/Output APIs and Data Organization for High Perfor-mance Scientific Computing. in:Proceedings of the 2008 ACM Petascale Data Storage Workshop (PDSW 08), Austin, TX:IEEE Computer Society,2008,1-6
    [34]Kallahalla M, Varman P J. Optimal prefetching and caching for parallel I/O sytem.s. in:Proceed-ings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, New York, NY, USA:ACM,2001,219-228
    [35]Maria S. Perez V R J M P, Perez F. Optimizations Based on Hints in a Parallel File System. Computational Science, Springer Berlin/Heidelberg,2004,3038:347～354
    [36]Sun X H, Chen Y, Yin Y. Data layout optimization for petascale file systems. in:Proceedings of the 4th Annual Workshop on Petascale Data Storage, New York, NY, USA:ACM,2009,11～15
    [37]Jain R, Somalwar K, Werth J, et al. Heuristics for Scheduling I/O Operations. IEEE Transactions on Parallel and Distributed Systems, IEEE Press,1997,8(3):310～320
    [38]Felix Garcia-Carballeira A C J D G, Sancheza L M. A global and parallel file system for grids. Future Generation Computer Systems, ACM Press,2007,23(1):116-122
    [39]Yu W, Tian Y, Vetter J S. Efficient Zero-Copy Noncontiguous I/O for Globus on InfiniBand. in: Proceedings of the 39th International Conference on Parallel Processing Workshops(ICPPW'10), Washington, DC, USA:IEEE Computer Society,2010,362～368
    [40]Ching A, Choudhary A, Liao W k, et al. Noncontiguous I/O through PVFS. in:Proceedings of the IEEE International Conference on Cluster Computing(CLUSTER'02), Washington, DC, USA: IEEE Computer Society,2002,405～415
    [41]Isaila F, Malpohl G, Olaru V, et al. Integrating collective I/O and cooperative caching into the "clusterfile" parallel file system. in:Proceedings of the 18th annual international conference on Supercomputing, New York, NY, USA:ACM,2004,58～67
    [42]Wang F, Xin Q, Hong B, et al. File System Workload Analysis for Large Scale Scientific Comput-ing Applications. in:Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'04), College Park, MD:IEEE Computer Society, April,2004,139-152
    [43]Uselton A, Howison M, Wright N, et al. Parallel I/O performance:From events to ensembles. in: Proceedings of IEEE International Symposium on Parallel Distributed Processing(IPDPS'10), Atlanta, GA:IEEE Computer Society, April,2010,1-11
    [44]Bailey F R. High-End Computing Challenges in Aerospace Design and Engineering. in:Pro-ceedings of the Third International Conference on Computational Fluid Dynamics(ICCFD'04), Toronto, CA:Springer Berlin Heidelberg, July,2004,13～26
    [45]Hoare C A R. Towards a theory of parallel programming. New York, NY, USA:Springer-Verlag Inc.,2002,231-244
    [46]Snir M, Otto S, Lederman S H, et al. MPI-The Complete Reference. Volume 1-The MPI-1 Core., 2nd ed. The MIT Press,1998,120-145
    [47]Thakur R, Gropp W, Lusk E. On implementing MPI-IO portably and with high performance. in:Proceedings of the sixth workshop on I/O in parallel and distributed systems, New York, NY, USA:ACM,1999,23-32
    [48]Gropp W, Lusk E, Doss N, et al. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput., Elsevier Science Publishers B. V.,1996, 22(6):789～828
    [49]Thakur R, Gropp W, Lusk E. Data Sieving and Collective I/O in ROMIO. in:Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation, Washington, DC, USA: IEEE Computer Society,1999,182-190
    [50]Coarfa C, Dotsenko Y, Mellor-Crummey J, et al. An evaluation of global address space languages: co-array fortran and unified parallel C. in:Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, New York, NY, USA:ACM,2005,36～47
    [51]Thakur R, Choudhary A, Bordawekar R, et al. Passion:Optimized I/O for Parallel Applications. IEEE Computer, IEEE Computer Society Press,1996,29(6):70～78
    [52]Kotz D. Disk-directed I/O for MIMD multiprocessors. ACM Trans. Comput. Syst., ACM,1997, 15(1):41-74
    [53]Seamons K E, Chen Y, Jones P, et al. Server-directed collective I/O in Panda. in:Proceedings of the ACM/IEEE conference on Supercomputing (SC'95), New York, NY, USA:ACM,1995, 57～58
    [54]Juszczak C. Improving the write performance of an NFS server. in:Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, Berkeley, CA, USA:USENIX Association,1994,20～21
    [55]Lombard P, Denneulin Y. nfsp:A Distributed NFS Server for Clusters of Workstations. in:Pro-ceedings of the 16th International Parallel and Distributed Processing Symposium, Washington, DC, USA:IEEE Computer Society,2002,352～360
    [56]Thekkath C A, Mann T, Lee E K. Frangipani:a scalable distributed file system. in:Proceedings of the sixteenth ACM symposium on Operating systems principles, New York, NY, USA:ACM, 1997,224～237
    [57]Miller S W.A reference model for mass storage systems. Advances in computers, Academic Press Professional, Inc.,1988,18(7):157～210
    [58]Drapeau A L, Shirriff K W, Hartman J H, et al. RAID-II:a high-bandwidth network file server. in: Proceedings of the 21st annual international symposium on Computer architecture, Los Alamitos, CA, USA:IEEE Computer Society Press,1994,234～244
    [59]Cabrera L, Long D D. SWIFT:USING DISTRIBUTED DISK STRIPING TO PROVIDE HIGH I/O DATA RATES. Technical report, University of California at Santa Cruz, CA, USA,1991
    [60]Schmuck F, Haskin R. GPFS:A Shared-Disk File System for Large Computing Clusters. in: Proceedings of the 1 st USENIX Conference on File and Storage Technologies, Berkeley, CA, USA:USENIX Association, January,2002,231-244
    [61]Soltis S R, Ruwart T M, Oapos;Keefe M T, et al. The Global File System:A File System for Shared Disk Storage. IEEE Transactions on Parallel and Distributed Systems, IEEE Computer Society,1997,10(4):1-40
    [62]Fasheh M. OCFS2:The Oracle Clustered File System, Version 2. in:Proceedings of the 2006 Linux Symposium, Ottawa, Canada:ACM Press, March,2006,289～302
    [63]Menon J, Pease D A, Rees R M, et al. IBM Storage Tank-A heterogeneous scalable SAN file system. IBM Systems Journal, IBM Press,2003,42:250～267
    [64]Welch B, Unangst M, Abbasi Z, et al. Scalable performance of the Panasas parallel file system. in:Proceedings of the 6th USENIX Conference on File and Storage Technologies(FAST'08), Berkeley, CA, USA:USENIX Association,2008,1～17
    [65]Carns P H, Ligon W B, Ross R B, et al. PVFS:a parallel file system for linux clusters. in: Proceedings of the 4th annual Linux Showcase & Conference, Berkeley, CA, USA:USENIX Association,2000,28～29
    [66]Hildebrand D, Honeyman P. Direct-pNFS:scalable, transparent, and versatile access to parallel file systems. in:Proceedings of the 16th international symposium on High performance distributed computing(HPDC'07), New York, NY, USA:ACM,2007,199～208
    [67]Weil S A, Brandt S A, Miller E L, et al. Ceph:a scalable, high-performance distributed file system. in:Proceedings of the 7th symposium on Operating systems design and implementation(OSDI '06), Berkeley, CA, USA:USENIX Association,2006,307～320
    [68]Vaidyanathan K, Panda D. Benefits of I/O Acceleration Technology (I/OAT) in Clusters. in: Proceedings of the IEEE International Symmposium on Performance Analysis of Systems and Software, Los Alamitos, CA, USA:IEEE Computer Society,2007,220～229
    [69]Nisar A, Liao W k, Choudhary A. Scaling parallel I/O performance through I/O delegate and caching system. in:Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Piscat-away, NJ, USA:IEEE Press,2008,1-12
    [70]Patrick C M, Kandemir M, Karakoy M, et al. Cashing in on hints for better prefetching and caching in PVFS and MPI-IO. in:Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, New York, NY, USA:ACM,2010,191-202
    [71]Posadas H, Adamez J, Sanchez P, et al. POSIX modeling in SystemC. in:Proceedings of ASP-DAC'06:the 2006 Asia and South Pacific Design Automation Conference, Piscataway, NJ, USA: IEEE Press,2006,485-490
    [72]Walli S R. The POSIX family of standards. StandardView, ACM,1995,3(1):11-17
    [73]Gray J, Liu D T, Nieto-Santisteban M, et al. Scientific data management in the coming decade. SIGMOD Rec., ACM,2005,34(4):34～41
    [74]Latham R, Ross R, Thakur R. The Impact of File Systems on MPI-IO Scalability. Recent Advances in Parallel Virtual Machine and Message Passing Interface, Springer Berlin/Heidelberg,2004, 3241:146-198
    [75]Schikuta E, Wanek H. Parallel I/O. Int. J. High Perform. Comput. Appl., Sage Publications, Inc., 2001,15(2):162～168
    [76]Terrasa A, Espinosa A, Garcia-Fornes A. Lightweight POSIX tracing. Softw. Pract. Exper., John Wiley & Sons, Inc.,2008,38(5):447-469
    [77]Hildebrand D, Nisar A, Haskin R. pNFS, POSIX, and MPI-IO:a tale of three semantics. in: Proceedings of the 4th Annual Workshop on Petascale Data Storage, New York, NY, USA:ACM, 2009,32-36
    [78]Ching A, Coloma K, Li J, et al. High-Performance Techniques for Parallel I/O. CRC Press, December,2007
    [79]Berson A. Client/server architecture (2nd ed.). New York, NY, USA:McGraw-Hill, Inc.,1996
    [80]Booth D, Haas H, Mccabe F, et al. Web Services Architecture. BT TECHNOLOGY JOURNAL, ACM Press,2006,22(1):19～26
    [81]Erl T. Service-Oriented Architecture:Concepts, Technology, and Design. Upper Saddle River, NJ, USA:Prentice Hall PTR,2005
    [82]Shao G, Berman F, Wolski R. Master/Slave Computing on the Grid. Heterogeneous Computing Workshop, IEEE Computer Society,2000,0:3-9
    [83]Singh M P. Peer-to-peer computing for information systems. in:Proceedings of the 1st in-ternational conference on Agents and peer-to-peer computing(AP2PC'02), Berlin, Heidelberg: Springer-Verlag,2003,15～20
    [84]Sterling T, Lusk E, Gropp W, editors. Beowulf Cluster Computing with Linux,2 ed. Cambridge, MA, USA:MIT Press,2003
    [85]Hammond J L, Minyard T, Browne J. End-to-end framework for fault management for open source clusters:Ranger. in:Proceedings of the 2010 TeraGrid Conference(TG'10), New York, NY, USA: ACM,2010,1-6
    [86]Dean J, Ghemawat S. MapReduce:simplified data processing on large clusters. Communications of the ACM-50th anniversary issue:1958-2008, ACM,2008,51(1):107-113
    [87]Chang F, Dean J, Ghemawat S, et al. Bigtable:A Distributed Storage System for Structured Data. in:Proceedings of the 7th Symposium on Operating System Design and Implementa-tion(OSDI'06), Seattle, WA, USA:ACM Press, November,2006,205-218
    [88]Cooper B F, Ramakrishnan R, Srivastava U, et al. PNUTS:Yahoo!'s hosted data serving platform. Proc. VLDB Endow., VLDB Endowment,2008,1(2):1277～1288
    [89]Olston C, Reed B, Srivastava U, et al. Pig latin:a not-so-foreign language for data process-ing. in:Proceedings of the 2008 ACM SIGMOD international conference on Management of data(SIGMOD'08), New York, NY, USA:ACM,2008,1099～1110
    [90]Stonebraker M, Abadi D J, Batkin A, et al. C-store:a column-oriented DBMS. in:Proceedings of the 31st international conference on Very large data bases(VLDB'05). VLDB Endowment,2005, 553～564
    [91]Boncz P, Grust T, Keulen M, et al. MonetDB/XQuery:a fast XQuery processor powered by a relational engine. in:Proceedings of the 2006 ACM SIGMOD international conference on Management of data(SIGMOD'06), New York, NY, USA:ACM,2006,479-490
    [92]Grossman R, Gu Y. Data mining using high performance data clouds:experimental studies us-ing sector and sphere. in:Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA:ACM,2008,920～927
    [93]Palankar M R, Iamnitchi A, Ripeanu M, et al. Amazon S3 for science grids:a viable solution? in:Proceedings of the 2008 international workshop on Data-aware distributed computing(DADC '08), New York, NY, USA:ACM,2008,55～64
    [94]Waas F M. Beyond Conventional Data Warehousing:Massively Parallel Data Processing with Greenplum Database. Business Intelligence for the Real-Time Enterprise, Lecture Notes in Busi-ness Information Processing, Springer Berlin Heidelberg,2009,27:89～96
    [95]Watanabe H, Bailey B, Duda K, et al. The ASTER Data System:An Overview of the Data Products in Japan and in the United States. Land Remote Sensing and Global Environmental Change-Remote Sensing and Digital Image Processing, Springer New York,2011,11:233～244
    [96]Wang F, Br S A, Miller E L, et al. OBFS:A file system for object-based storage devices. in: Proceedings of the 2003 Conference on File and Storage Technologies(FAST'03), Berkeley, CA, USA:USENIX Association,2003,283～300
    [97]Abiteboul S, Cluet S, Milo T. A database interface for file update. in:Proceedings of SIGMOD '95:the 1995 ACM SIGMOD international conference on Management of data, New York, NY, USA:ACM Press, May,1995,386～397
    [98]Traeger A, Zadok E, Joukov N, et al. A nine year study of file system and storage benchmarking. in:Proceedings of FAST'08:the 7th USENIX Conference on File and Storage Technologies, New York, NY, USA:ACM,2008,1-56
    [99]Factor M, Meth K, Naor D, et al. Object storage:the future building block for storage systems. in:Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Tech-nology, Washington, DC, USA:IEEE Computer Society,2005,119～123
    [100]Mesnier M, Ganger G R, Riedel E. Object-based storage. IEEE Communications Magazine, IEEE Press,2003,41 (8):84～90
    [101]John, Thekkath R A, Zhou L. Boxwood:Abstractions as the Foundation for Storage Infrastructure. in:Proceedings of the 5th symposium on Operating systems design and implementation(OSDI '04), Berkeley, CA, USA:USENIX Association,2004,105～120
    [102]Gifford D K, Jouvclot P, Sheldon M A, et al. Semantic file systems. in:Proceedings of the thirteenth ACM symposium on Operating systems principles(SOSP'91), New York, NY, USA: ACM,1991,16～25
    [103]Grider G, Nunez J, Bent J, et al. Coordinating government funding of file system and I/O research through the high end computing university research activity. SIGOPS Oper. Syst. Rev., ACM, 2009,43(1):2-7
    [104]Rajgarhia A, Gehani A. Performance and extension of user space file systems. in:Proceedings of the 2010 ACM Symposium on Applied Computing, New York, NY, USA:ACM,2010,206-213
    [105]Liao W, Coloma K, Choudhary A, et al. Scalable Design and Implementations for MPI Parallel Overlapping I/O. IEEE Transactions on Parallel and Distributed Systems, IEEE Computer Society, 2006,17:1264～1276
    [106]Vilayannur M, Ross R B, Cams P H, et al. On the Performance of the POSIX I/O Interface to PVFS. in:Proceedings of Euromicro Conference on Parallel, Distributed, and Network-Based Processing, Los Alamitos, CA, USA:IEEE Computer Society,2004,332～342
    [107]Liao W, Ching A, Coloma K, et al. Improving MPI independent write performance using a two-stage write-behind buffering method. in:Proceedings of the International Parallel and Dis-tributed Processing Symposium, Los Alamitos, CA, USA:IEEE Computer Society, March,2007, 295～306
    [108]Chen Y, Sun X H, Thakur R, et al. Improving Parallel I/O Performance with Data Layout Aware-ness. in:Proceedings of the 2010 IEEE International Conference on Cluster Computing, Wash-ington, DC, USA:IEEE Computer Society,2010,302～311
    [109]Sun X H, Chen Y, Yin Y. Data layout optimization for petascale file systems. in:Proceedings of the 4th Annual Workshop on Petascale Data Storage, New York, NY, USA:ACM,2009,11-15
    [110]Sung I J, Stratton J A, Hwu W M W. Data layout transformation exploiting memory-level paral-lelism in structured grid many-core applications. in:Proceedings of the 19th international con-ference on Parallel architectures and compilation techniques, New York, NY, USA:ACM,2010, 513～522
    [111]Wozniak J M, Wilde M. Case studies in storage access by loosely coupled petascale applications. in:Proceedings of the 4th Annual Workshop on Petascale Data Storage, New York, NY, USA: ACM,2009,16-20
    [112]Dadeau F, Kermadec A, Tissot R. Combining Scenario- and Model-Based Testing to Ensure POSIX Compliance. in:Proceedings of the 1st international conference on Abstract State Ma-chines, B and Z(ABZ'08), Berlin, Heidelberg:Springer-Verlag,2008,153-166
    [113]Ji Q, Qing S, He Y. A new formal model for privilege control with supporting POSIX capability mechanism. Science in China Series F:Information Sciences, Science China Press, co-published with Springer,2005,48(1):46～66
    [114]Carns P, Lang S, Ross R, ct al. Small-file access in parallel file systems. in:Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, Washington, DC, USA: IEEE Computer Society,2009,1-11
    [115]Jin C, Buyya R. MapReduce Programming Model for.NET-Based Cloud Computing. in:Pro-ceedings of Euro-Par 2009 Parallel Processing. Springer Berlin/Heidelberg,2009,417～428
    [116]Xu Y, Kostamaa P, Gao L. Integrating hadoop and parallel DBMs. in:Proceedings of the 2010 international conference on Management of data, New York, NY, USA:ACM,2010,969～974
    [117]Condie T, Conway N, Alvaro P, et al. MapReduce online. in:Proceedings of the 7th USENIX conference on Networked systems design and implementation, Berkeley, CA, USA:USENIX As-sociation,2010,21-22
    [118]Iu M Y, Zwaenepoel W. HadoopToSQL:a mapReduce query optimizer. in:Proceedings of the 5th European conference on Computer systems, New York, NY, USA:ACM,2010,251-264
    [119]Yang H c, Dasdan A, Hsiao R L, ct al. Map-reduce-merge:simplified relational data process-ing on large clusters. in:Proceedings of the 2007 ACM SIGMOD international conference on Management of data, New York, NY, USA:ACM,2007,1029-1040
    [120]Ekanayake J, Li H, Zhang B, et al. Twister:a runtime for iterative MapReduce. in:Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, New York, NY, USA:ACM,2010,810-818
    [121]Lin J, Schatz M. Design patterns for efficient graph algorithms in MapReduce. in:Proceedings of the Eighth Workshop on Mining and Learning with Graphs, New York, NY, USA:ACM,2010, 78～85
    [122]Chen S, Schlosser S W. Map-Reduce Meets Wider Varieties of Applications. Technical Report IRP-TR-08-05, Intel Research Pittsburgh, May,2005
    [123]He B, Fang W, Luo Q, et al. Mars:a MapReduce framework on graphics processors. in:Pro-ceedings of the 17th international conference on Parallel architectures and compilation techniques, New York, NY, USA:ACM,2008,260～269
    [124]Kaashoek F, Morris R, Mao Y. Optimizing MapReduce for Multicore Architectures. Technical Report MIT-CSAIL-TR-2010-020, MIT, May 05,2010. http://hdl.handle.net/1721.1/54692
    [125]Chen R, Chen H, Zang B. Tiled-MapReduce:optimizing resource usages of data-parallel appli-cations on multicore with tiling. in:Proceedings of the 19th international conference on Parallel architectures and compilation techniques, New York, NY, USA:ACM,2010,523～534
    [126]Sandholm T, Lai K. MapReduce optimization using regulated dynamic prioritization. in:Pro-ceedings of the eleventh international joint conference on Measurement and modeling of computer systems, New York, NY, USA:ACM,2009,299-310
    [127]Ananthanarayanan R, Gupta K, Pandey P, et al. Cloud analytics:do we really need to reinvent the storage stack? in:Proceedings of the 2009 conference on Hot topics in cloud computing, Berkeley, CA, USA:USENIX Association,2009,15-16
    [128]Moise D, Antoniu G, Bouge L. Improving the Hadoop map/reduce framework to support concur-rent appends through the BlobSeer BLOB management system. in:Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, New York, NY, USA: ACM,2010,834-840
    [129]Venner J. Pro Hadoop,1st ed. Berkely, CA, USA:Apress,2009
    [130]Molina-Estolano E, Gokhale M, Maltzahn C, et al. Mixing Hadoop and HPC workloads on parallel filesystems. in:Proceedings of the 4th Annual Workshop on Petascale Data Storage, New York, NY, USA:ACM,2009,1-5
    [131]Wang G, Salles M V, Sowell B, et al. Behavioral simulations in MapReduce. Proc. VLDB Endow., VLDB Endowment,2010,3(1-2):952～963
    [132]Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB:an architectural hybrid of MapRe-duce and DBMS technologies for analytical workloads. Proc. VLDB Endow., VLDB Endowment, 2009,2(1):922～933
    [133]Carlos Maltzahn A K A J N S A B, Weil S. Ceph as a Scalable Alternative to the Hadoop Dis-tributed File System. The USENIX Magazine, USENIX Association,2010,35(4):38-49
    [134]Hocflcr T, Lumsdainc A, Dongarra J. Towards Efficient MapReduce Using MPI. in:Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Berlin, Heidelberg:Springer-Verlag,2009,240～249
    [135]Jiang D, Ooi B C, Shi L, et al. The performance of MapReduce:an in-depth study. the VLDB Endowment, VLDB Endowment,2010,3(1-2):472～483
    [136]Babu S. Towards automatic optimization of MapReduce programs. in:Proceedings of the 1st ACM symposium on Cloud computing(SoCC'10), New York, NY, USA:ACM,2010,137-142
    [137]Plank J S, Luo J, Schuman C D, et al. A performance evaluation and examination of open-source erasure coding libraries for storage. in:Proceedings of the 7th conference on File and storage technologies, Berkeley, CA, USA:USENIX Association,2009,253～265
    [138]Hafner J L. WEAVER codes:highly fault tolerant erasure codes for storage systems. in:Proceed-ings of the 4th conference on USENIX Conference on File and Storage Technologies, Berkeley, CA, USA:USENIX Association,2005,16～18
    [139]Goodson G R, Wylie J J, Ganger G R, et al. Efficient Byzantine-Tolerant Erasure-Coded Stor-age. in:Proceedings of the International Conference on Dependable Systems and Networks, Los Alamitos, CA, USA:IEEE Computer Society,2004,135-143
    [140]Chun B G, Dabck F, Hacbcrlen A, et al. Efficient replica maintenance for distributed storage systems. in:Proceedings of the 3rd conference on Networked Systems Design & Implementa-tion(NSDI'06), Berkeley, CA, USA:USENIX Association,2006,4-5
    [141]Dimakis A G, Prabhakaran V, Ramchandran K. Decentralized erasure codes for distributed net-worked storage. IEEE/ACM Transactions on Networking (TON)-Special number on networking and information theory, IEEE Press,2006,14(SI):2809～2816
    [142]Dimakis A G, Godfrey P B, Wu Y, et al. Network coding for distributed storage systems. IEEE Trans. Inf. Theor., IEEE Press,2010,56(9):4539～4551
    [143]Seo S, Jang I, Woo K, et al. HPMR:Prefetching and pre-shuffling in shared MapReduce compu-tation environment. in:Proceedings of the IEEE International Conference on Cluster Computing and Workshops(CLUSTER'09), New Orleans, LA:IEEE Computer Society, August,2009,1-8
    [144]Settlemyer B W. A study of client-based caching for parallel i/o:[PhD Dissertation]. Clemson, SC, USA:School of Computing, Clemson University, August,2009
    [145]Plank J S, Thomason M G. A Practical Analysis of Low-Density Parity-Check Erasure Codes for Wide-Area Storage Applications. in:Proceedings of the International Conference on Dependable Systems and Networks, Los Alamitos, CA, USA:IEEE Computer Society,2004,115-120
    [146]Fan B, Tantisiriroj W, Xiao L, et al. DiskReduce:RAID for data-intensive scalable computing. in: Proceedings of the 4th Annual Workshop on Petascale Data Storage, New York, NY, USA:ACM, 2009,6-10
    [147]Feldman J. Using many machines to handle an enormous error-correcting code. in:Proceedings of the IEEE Conference on Information Theory Workshop, ITW'06 Punta del Este, Punta del Este, Uruguay:IEEE Computer Society, June,2006,180-182
    [148]Chen Y, Ganapathi A, Katz R H. To compress or not to compress-compute vs. IO tradeoffs for mapreduce energy efficiency. in:Proceedings of the first ACM SIGCOMM workshop on Green networking, New York, NY, USA:ACM,2010,23-28
    [149]Zhe Zhang X M E T, Narayanan D. Does erasure coding have a role to play in my data center? Technical Report MSR-TR-2010-52, Microsoft Research Cambridge (UK), May,2010. http:// www4.ncsu.edu/-zzhang3/pubs/ec_msr_tr.pdf

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700