用户名: 密码: 验证码:
多核平台上支持推测并行化的事务存储体系结构性能优化
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着多核平台的普及,如何利用多核加速串行应用的执行已成为学术界和工业界共同关注的热点研究问题。而传统的显式锁同步机制自身就有着高复杂性、易错性和性能保守等天然缺陷,从根本上限制了并行程序的可扩展性和编程效率,也限制了对多核资源的充分利用。为了开发更多的多核结构上可利用的线程级并行性,利用事务存储(Transactional Memory, TM)技术来解决并行程序正确性维护给并行编程带来的复杂性和对性能的制约问题,已成为学术界和工业界的共识。本文从有效开发应用中的线程级并行性入手,着眼于高效能、易编程和可兼容这三个目标,通过软硬件协同的优化方式对支持推测并行化的多核事务存储体系结构展开深入研究,使之既能提高多核芯片片上计算资源的有效利用率,又能有效降低并行编程难度,平滑移植传统应用软件。
     本文从线程划分和线程执行两个方面,对软硬件协同支持推测并行化的多核事务存储体系结构性能优化开展了深入系统的研究,涉及结构模型、编程模型、性能分析模型、离线剖析指导线程划分机制和在线剖析指导线程执行机制等方面的内容。主要研究内容和成果包括:(1)对两种主流线程级推测并行技术的发展趋势进行了详细而深入的调研,通过对其软硬件支持机制的分析与比较,提出了一种新型的软硬件协同支持推测并行化的多核事务存储体系结构设汁方案。该系统在线程划分中采用软件线程级推测技术思想,而在线程执行时采用硬件事务存储技术支持,并通过离线剖析和在线剖析技术来协同各种软硬件因素,达到了同时提高程序性能和降低并行编程难度的双重目标。(2)在软件线程划分方面,从简化并行编程和提高并行执行性能的优化角度出发,提出了一套线程级推测并行性的判定准则、研究方法和剖析机制,确定了利用离线剖析技术来实现基于事务存储的线程划分方案。并依据该机制设计并实现了一套线程级推测并行性离线剖析工具集OpenPro。(3)利用OpenPro工具集,对桌面应用、多媒体应用和高性能计算应用中影响线程级推测并行性的关键因素进行剖析,从应用本身并行潜能的角度进行探讨,获得了诸如单个程序仅能有效利用16核的计算资源是当前多核技术路线的拐点等一些重要认识。(4)在线程执行支持机制方面,从可扩展性好和易于硬件实现这两个优化目标出发,提出了一种支持优先级判定的基于目录的高速缓存一致性协议,并在此基础上设计和完成了一种利用运行时库机制同时支持线程级推测和事务存储语义的分布式可扩展多核事务存储处理器PTT硬件模拟器。该处理器设计突破了以往类似方案中总线等集中式结构对硬件系统可扩展性的限制,同时实现了可扩展性好和易于硬件设计实现的目标;而这套采用积极版本管理和积极检测机制的分布式硬件事务存储机制会自动维护硬件系统的一致性,极大地减少了程序员在进行并行程序设计时的繁杂工作和复杂程度。这对于普及并行程序设计,提高并行程序生产力都有着非常重要的意义。(5)提出了一种针对线程级推测并行技术的PCL性能分析模型,并依据该模型提出了将在线剖析技术引入PTT硬件模拟平台的最终优化方案。同时通过对PTT系统多种软硬件手段实现机制的分析与协同,分别从正确性、有效性和灵活可配置性三个层次对其进行了充分的评测和分析。
     本文的研究工作获得了如下一些重要的认识:(1)将线程级推测技术与事务存储技术结合起来,通过软硬件因素的合理协调,可以有效地开发出串行程序中潜在的线程级并行性,同时有效地降低并行编程的难度,极大地提高并行程序设计生产力。(2)在目前以传统超标量单核构造多核芯片的技术路线下,着眼于同时兼顾硬件的有效利用率和尽可能挖掘程序中固有的并行性,在加速单个串行应用时,部分依赖严重的应用(如SPEC和一些依赖严重的科学计算程序等)采用2-4核就已经足够;而多数多媒体应用和高性能计算应用采用8-16核来加速比较合适;一些特别适合的应用则可以有效利用64-128核及以上的计算资源。(3)虽然现在线程级推测并行技术在桌面应用这样一些程序中数据依赖严重的应用中效果不佳,但在部分拥有大计算量、推测线程粒度适中并且存在模糊依赖的多媒体和高性能计算应用中还是适宜采用的。线程级推测并行技术最大的优势是其兼容性与易编程性,只要在这两点上做好文章,实现多数经典应用软件到多核平台上的平滑移植,进一步解放程序员,线程级推测并行技术就会在体系结构研究中拥有重要的一席之地。
     本文的研究工作和结果可用于指导共享存储的多核芯片体系结构及其并行编程环境的设计,以尽可能小的并行程序设计难度、系统软件复杂性和系统硬件代价,尽可能多地从串行程序中开发出多核芯片上可利用的线程级并行性。
With the popularity of multicore platforms, how to make use of multicore computing resources to accelerate the traditional serial applications has become a common concern problem. The traditional explicit lock synchronization mechanism has its natural defects in the complexity, fallibility and conservative performance, so that fundamentally limits the scalability and efficiency of parallel programing, but also limits the full use of the multicore resources. In order to develop more thread-level parallelism from multi-core architecture, using Transactional Memory (TM) technology to solve the traditional parallel programming complexity and its constraints on performance, has become academia and industry consensus. This paper takes how to develop effective thread-level parallelism from applications as the point of departure, aims at three goals of high-performance, easy programming and compatibility, and coordinates hardware and software to make depth research on multicore transactional memory architecture supporting speculative parallelization. It can raise the effective utilization of multicore computing resources, reduce the difficulty of parallel programming effectively, and make the smooth migration of traditional applications.
     This dissertation carries out in-depth systematic study in the views of both thread partition and thread execution in the multicore transactional memory architecture supporting speculative parallelization, involving structural model, programming model, performance analysis model, thread partition guided by offline profiling and thread execution guided by online profiling aspects. The major research contributions include:(1) based on a survey on two main parallel speculative thread-level parallel technological trends in detail and a comparison between their software and hardware support mechanism, a novel hardware and software co-designed multicore transactional memory architecture is proposed. It uses software thread-level speculation ideas to guide the thread partition and hardware transactional memory technology to support thread execution, coordinates hardware and software elements by offline and online profiling technology, and achieves the goals of both improving applications'performance and reducing the parallel programming's difficulty. (2) In the optimization aspect of software thread partition, aiming at simpler parallel programming and improving the parallel execution performance, a set of criteria, research methods and profiling mechnisms for speculative thread-level parallelism are proposed. An offline profiling guiede thread partition scheme for transactional memory is determined. And a set of offline profling tools named Openpro is developed to exploring the thread-level parallelism based on the criteria. (3) It analyzes the key thread-level parallelism performance impacting factors in desktop, media and HPC fields and makes an investigation from the view of the applications' own parallelism potential. (4) In the aspect of hardware thread execution, aiming at good scalability and easy to implement, a priority determination supporting directory-based cache coherence protocol is proposed. Based on this, a scalable distributed multicore transactional memory processor hardware simulator PTT is developed. It can support both thread-level speculation and transctional memory semantics by run-time libraries supporting mechanisms. This design breakthrough the limitations on hardware scalability in the past brought by the centralized structure mechanisms, such as bus architecture, and achieve both of the good scalability and easy-to-hardware-design goals. It uses positive version management and active violation detection mechanism in this distribute transactional memory system, so that the system will automatically maintain the consistency of the hardware system and greatly reduce the complication and complexity of parallel programming work. It has very important meaning on improving parallel programming productivity and making parallel programming popular. (5) A speculative thread-level parallelism performance analysis model named PCL is proposed. According to the PCL model, the PTT system brings the online profiling techonology into the platform. At the same time, coordinating a variety of hardware and software mechanisms, a final evaluation and analysis on the PTT system is carried out from three levels:accuracy, effectiveness and flexibility.
     Based on the work of this dissertation, some important conclusions are drawn as following:(1) it's reasonable to combine the benefits of the thread-level speculation and transactional memory technology through coordination of hardware and software mechanisms. It can effectively develop the potential thread-level parallelism form the serial program while effectively reduce the difficulty of parallel programming, and greatly improve parallel programming productivity. (2) In the present multicore chips' technology roadmap that they were made by some single superscalar cores, aiming at both of the effective utilization of hardware and exploring inherent parallelism as much as possible, the desktop applications can use 2 cores computing resources efficiently while lots of multimedia and HPC applications are suitable to use 8-16 cores'computing resources. And some particularly suitable applications can use 64-128 cores'computing resources effectively. (3) It showed that although speculative thread-level parallel technology didn't perform well in the desktop applications that have serious data dependence problem, it's suitable for most multimedia and HPC applications that have large calculation, moderate thread size, and fuzzy dependence but easy to resolve. The biggest advantage of speculative thread-level parallel technology is its compatibility and easy programming, making good use of the two points, the speculative thread-level parallel technology can have an important place in the computer architecture research.
     All the works in this dissertation can be used to guide the designing of parallel programming model and compiler on the shared memory multicore processor architecture, to be helpful for the designing of high-performance on-chip multicore architecture, and to expose more parallelism from application with less hardware, software complexity and less hardness in parallel programming. v
引文
陈嘉.2006.一种基于事务存储模型多核结构上的编程模型设计和实现[D].硕士论文,中国科学技术大学,合肥.
    郭锐.2009.支持推测并行化的可扩展事务存储体系结构设计与性能评价[D].硕士论文,中国科学技术大学,合肥.
    何裕南.2006.一个支持事务存储的多核处理器结构设计[D].硕士论文,中国科学技术大学,合肥.
    梁博.2007.多核结构上的线程级推测关键技术研究[D].博士论文,中国科学技术大学,合肥.
    刘圆.2007.多核结构上高效的线程级推测及事务执行模型研究[D].博士论文,中国科学技术大学,合肥.
    Microbench website www.cs.utexas.edu/cart/code/microbench.tgz
    Akkary H. and Driscoll. M. A.1998. A dynamic multithreading processor [C]. Proceedings of the 31st Annual International Symposium on Microarchitecture (MICRO'98).
    Ananian C. Scott, Asanovic Krste, Kuszmaul Bradley C., et. al.2005. Unbounded Transactional Memory [C]. Proceedings of the 11th International Symposium on High-Performance Computer Architecture.
    Arun Kejariwal, Xinmin Tian, et al.2006. On the performance potential of different types of speculative thread-level parallelism [C]. Proceedings of the 20th annual international conference on Supercomputing.
    Asanovic K., Bodik R., Catanzaro B.C., et al.2006. The Landscape of Parallel Computing Research:A View from Berkeley [R]. Technical Report No. UCB/EECS-2006-183.
    Austin T., Larson E., Ernst D.,2002. SimpleScalar:An Infrastructure System Modeling [J]. IEEE Computer,35(2):59-67.
    Ball T., Larus J.R.1996. Efficient Path Profiling [C]. Proceeding of the 29th Annual IEEE/ACM International Symposium on Microarchitecture..
    Barroso A. et al.2000. Piranha:A Scalable Architecture Based on Single-chip Multiprocessing [C]. Proceedings of the 27th Annual International Symposium on Computer Architecture.
    Bobba J, Goyal N, Hill M, et al.2008. TokenTM:Efficient Execution of Large Transactions with Hardware Transactional Memory [C]. Proceedings of 35th International Symposium on Computer Architecture.
    Bratin Saha, Ali-Reza Adl-Tabatabai.2006. McRT-STM:a high performance software transactional memory system for a multi-core runtime [C]. Proceedings of the thirteenth ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP'06).
    Blundell C. et al.2007. Making the fast case common and the uncommon case simple in unbounded transactional memory [C]. Proceedings of the 34th Annual International Symposium on Computer Architecture.
    Calder B., Feller P., Eustace A.1997. Value Profiling [C]. Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture.
    Carlos Garcia Quinones, Carlos Madriles.2005. Mitosis compiler:An Infrastructure for Speculative Threading Based on Pre-computation Slices [C]. Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation.
    Ceze L, Tuck J, Torrellas J, et al.2006. Bulk Disambiguation of Speculative Threads in Multiprocessors[C]. Proceedings of the 33rd annual international symposium on Computer Architecture.
    Chandra Krintz.2003. Profile-based Optimizations:Coupling On-line and Off-line Profile Information to Improve Program Performance [C]. Proceedings of the International Symposium on Code Generation and Optimization.
    Chang F.W. and Gibson G. A.1999. Automatic i/o hint generation through speculative execution [C]. Proceedings of the Symposium on Operating Systems Design and Implementation.
    Chen Michael, Olukotun Kunle.2003. The JRPM System for Dynamically Parallelizing Java Programs [C]. Proceedings of the 30th Annual Symposium on Computer Architecture.
    Chen, T., Lin, J., Dai, X., et. al.2004. Data Dependence Profiling for Speculative Optimization [C]. Proceedings of the 13th International Conference on Compiler Construction (CC).
    Chuang W. et al.2006. Unbounded Page-Based Transactional Memory [C]. Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems.
    Cintra M. and Llanos D. R.2003. Toward efficient and robust software speculative parallelization on multiprocessors [C]. Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP'03).
    Clabes J. et al.,2004. Design and Implementation of the Power5 Microprocessor [C]. In ISSCC Digest of Technical Papers.
    Codrescu L. and Wills D. S.1999. Architecture of the atlas chip-multiprocessor:Dynamically parallelizing irregular applications [C]. Proceedings of the 1999 International Conference on Computer Design (ICCD'99),428-435.
    Consel C., Lawall J. L., Meur A.2004. A Tour of Tempo:A Program Specializer for the C Language [J]. Sci. Comput. Programm.52,1-3,341-370.
    Damron P. et al.,2006. Hybrid transactional memory [C]. Proceedings of the 12th international conference on Architectural support for programming languages and operating systems.
    Diefendorff K.1999. Power4 Focuses on Memory Bandwidth [J]. Microprocessor Report.
    Ding Chen, Shen Xipeng, et al.2007. Software behavior oriented parallelization [C]. Proceedings of the 2007 PLDI conference.
    Du Zhao-Hui, Lim Chu-Cheow, Li Xiao-Feng, et al.2004. A Cost-Driven Compilation Framework for Speculative Parallelizing Sequential Program [C]. Proceedings of ACM Conference on Programming Languages, Design, and Implementation.
    Eggers Susan, Emer Joel, Levy Henry, et al.1997. Simultaneous Multithreading:A Platform for Next-generation Processors [J]. IEEE Micro.
    Fisher J.A., Freudenberger S.M.1992. Predicating Conditional Branch Directions from Previous Runs of a program [C]. Proceedings of the 5th International Conference on Architecture Support for Programming Languages and Operating System.
    Gupta R., Mehofer E., et al.2002. Profile Guided Compiler Optimizations [M]. The Compiler Design Handbook:Optimizations& Machine Code Generation, Auerbach Publications.
    Grohoski Greg,1998. Reining in Complexity. IEEE Computer Magazine [J], Vol 31, Issue 1, 41-42.
    Hammond L., Willey M., Olukotun K.1998. Data Speculation Support for a Chip Multiprocessor [C]. ASPLOS-Ⅷ,32-33, issue 5.
    Hammond L., Hubbert B., Siu M., et al.2000. The Stanford Hydra CMP [J]. IEEE Micro,20(2):71-84.
    Hammond L., Wong V., Chen M.,2004. Transactional Memory Coherence and Consistency [C], Proceedings.31st Annual International Symposium,102-113.
    Hammond L., Carlstrom Brian D., Wong Vicky, et. al.2004. Transactional Coherence and Consistency:Simplifying Parallel Hardware and Software [J]. Micro's Top Picks, IEEE Micro.
    Hammond L., Carlstrom B.D., Wong V., et al.2004. Programming with Transactional Coherence and Consistency (TCC) [C], ASPLOS04.
    Hennessy J.L., Patterson D.A.,2003. Computer Architecture:A Quantitative Approach [M].3rd ed. Morgan Kaufmann Publishers, Inc.
    Herlihy M., Eliot J., Moss B.1992. Transactional Memory:Architectural Support for Lock-free Data Structures [R]. Technical Report, Digital Cambridge Research Lab, Cambridge, Massachusetts.
    Herlihy M., Moss B.,1993. Transactional Memory:Architectural Support for Lock-Free Data Structures [C]. Proceedings of the 20th Annual International Symposium on Computer Architecture.
    Herlihy M., Luchangco V., Moir M., et. al.2003. Software Transactional Memory for Dynamic-Sized Data Structures [C]. Proceedings of the 22nd Annual ACM Symposium on Principles of Distributed Computing.
    Ho R., Mai K., Horowitz M.2001. The Future of Wires [C]. Proceeding IEEE, Apr.2001, pp. 490-504.
    Hu S., Bhargava R., Kurian L.J.,2003. The Role of Return Value Prediction in Exploiting Speculative Method-Level Parallelism [J]. Journal of Instruction-Level Parallelism,1-21.
    Huang J., Lilja D.J.1998. An Efficient Strategy for Developing a Simulator for a Novel Concurrent Multithreaded Processor Architecture [C]. Proceedings of the 6th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.
    Intel Corporation.2002. Intel Itanium 2 Processor Reference Manual for Software Development and Optimization [R].
    Kistler T., Franz M.2003. Continuous Program Optimization:A Case Study [J]. ACM Trans. on Programming. Languages and Systems.
    Kozyrakis Christoforos, Patterson David,1998. A New Direction for Computer Architecture Research [J], IEEE Computer Magazine,24-32.
    Krishnan V., Torrellas J.,1997. Efficient Use of Processing Transistors for Larger On-Chip Storage:Multithreading [C], Workshop on Mixing Logic and DRAM:Chips that Compute and Remember.
    Krishnan V., Torrellas J.,1998. Hardware and Software Support for Speculative Execution of Sequential Binaries on a Chip-Multiprocessor [C], International Conference on Supercomputing (ICS).
    Krishnan V., Torrellas J.,1999. A Chip Multiprocessor Architecture with Speculative Multithreading [J], IEEE Transactions on Computers, Special Issue on Multithreaded Architecture
    Kumar Sanjeev, Chu Michael,et al.2006. Hybrid Transactional Memory [C], Proceedings of the twelfth ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP'06).
    Lam Monica, et.al.1994. The SUIF1 Compiler Infrastructure [EB]. http://suif.stanford.edu/.
    Lev Y., Moir M., and Nussbaum D.2007. PhTM:Phased Transactional Memory [C]. Proceedings of the Second ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Computing.
    Lev Y., Moir M.2008. Split hardware transactions:true nesting of transactions using best-effort hardware transactional memory [C]. Proceedings of 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming.
    Li Xiao-Feng, Du Zhao-Hui, Yang Chen, et. al.2004. Speculative Parallel Threading Architecture and Compilation [C]. the 9th Asia-Pacific Computer Systems Architecture Conference.
    Li Xiao-Feng, Yang Chen, Du Zhao-Hui, et. al.2005. Exploiting Thread-Level Speculative Parallelism with Software Value Prediction [C]. The Tenth Asia-Pacific Computer Systems Architecture Conference.
    Li Zhao, Ravi Iyer, Srihari Makineni, et al.2007. Performance, Area and Bandwidth Implications on Large-scale CMP Cache Design [C]. The Workshop on Chip-Multiprocessor Memory Systems and Interconnects (CMP-MSI) held along with International Symposium on High-Performance Computer Architecture (HPCA-13), Phoenix, Arizona.
    Liu Wei, James, et al.2006. TuckPOSH:a TLS compiler that exploits program structure [C]. Proceedings of thirteenth ACM SIGPLAN symposium on Principles and practice of parallel programming.
    Lupon Marc et al.,2009. FASTM:A Log-based Hardware Transactional Memory with Fast Abort Recovery[C]. Proceedings of 18th International Conference on Parallel Architectures and Compilation Techniques (PACT).
    Marathe V. J., Scott M. L.,2004. A Qualitative Survey of Modern Software Transactional Memory Systems [R]. Technical Report TR 839, Department of Computer Science, University of Rochester.
    Marcuello P., Gonzalez A.2002. Thread Spawning Schemes for Speculative Multithreaded Architecture [C]. Proceedings of the 8th International Symposium on High-Performance Computer Architecture.
    Marcuello P. and Gonzalez A.1999. Clustered speculative multithreaded processors [C]. Proceedings of the 1999 International Conference on Supercomputing (ICS'99).
    Marr D.T., Binns F., Hill D.L. et al.2002. Hyper-threading technology architecture and microarchitecture [J]. Intel Technology Journal.
    Martin M.M.K., Sorin D.J., Beckmann B.M., et al.2005. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset [EB]. Computer Architecture News (CAN).
    Martinez J.F. and Torrellas J.,2002. Speculative Synchronization:Applying Thread-Level Speculation to Explicitly Parallel Applications [C],10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
    McDonald Austen, Chung JaeWoong, Chafi Hassan, et.al.2005. Characterization of TCC on Chip-Multiprocessors [C]. The Fourteenth International Conference on Parallel Architectures and Compilation Techniques.
    Mendelson Avi, Mandelblat Julius, Gochman Simcha, et al,2006. CMP Implementation in Systems Based on the Intel Core Duo Processor [J], Intel Technology Journal,10, Issue 02.
    Moore K, Bobba J, Moravan MJ, et al.2006. LogTM:log-based transactional memory [C]. In: The Twelfth International Symposium on High-Performance Computer Architecture,2006. 254-265.
    Mudge Travor,2001. Power:a First-class Architectural Design Constraint [C], In Proceeding of the 7th International Conference on High Performance Computing,52-58.
    Nathan L.B., Erik G.H. and Steven K.R,2003. Network-Oriented Full-System Simulation using M5 [C]. In the sixth workshop on Computer Architecture Evaluation using Commercial Workloads.
    Nayfeh B.A., Hammond L., Olukotun K.,1996. Evaluation of Design Alternatives for a Multiprocessor Microprocessor [C], Proceedings of the 23rd International Symposium on Computer Architecture.
    Ohsawa T., Takagi M., Kawahara S., and Matsushita S.2005. Pinot:Speculative multithreading processor architecture exploiting parallelism over a wide range of granularities [C], Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, 81-92.
    Olukotun K., et al.1996. The case for a single-chip multiprocessor [C], Proceedings of the Seventh international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'96),2-11.
    Olukotun K, Hammond L.,2005. The Future of Microprocessors [J], QUEUE,27-34.
    Olukotun K., Hammond L., Willey M.1999. Improving the Performance of Speculatively Parallel Applications on the Hydra CMP [C], Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece.
    Oplinger J.T., Heine D.L., et al.1999. In Search of Speculative Thread-Level Parallelism[C]. Working Conference on Parallel Architectures and Compilation Techniques,303-313.
    Ortego P.M., Sack P.,2005. SESC:SuperESCalar Simulator [EB]. http://sesc.sourceforge.net/
    Part Y. N., Patel S.J., Evers M. et al.1997. One Billion Transistors, One Uniprocessor [J], One Chip. IEEE Computer.
    Patt Y. N.1996. First Let's Get the Uniprocessor Right [J]. Microprocessor Report.
    Prabhu M. K.,2005. Parallel Programming Using Thread-level Speculation [D], Stanford University.
    Pugh W.,1992. A Practiacal Algorithm for Exact Array Dependence Analysis [J]. Communciation of the ACM,35(8):102-114.
    Rajwar Ravi, Herlihy Maurice, Lai Konrad.2005. Virtualizing Transactional Memory [C]. Proceedings of the 32nd Annual International Symposium on Computer Architecture.
    Rajwar Ravi, Bernstein Philip A.2003. Atomic Transactional Execution in Hardware:A New High-Performance Abstraction for Databases? [C]. The 10th International Workshop on High Performance Transaction Systems.
    Rajwar Ravi, Goodman James R.2003. Transactional Execution:Toward Reliable, High-Performance Multithreading [J]. IEEE Micro,23(6):117-125.
    Rajwar Ravi, Goodman James R..2002. Transactional Lock-Free Execution of Lock-Based Programs [C]. Proceedings of the Tenth Symposium on Architectural Support for Programming Languages and Operating Systems, pp.5-17.
    Rauchwerger L. and Padua D. A.1995. The lrpd test:Speculative run-time arallelization of loops with privatization and reduction parallelization[C]. Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and Implementation (PLDI'95),218-232.
    Renau Jose, Strauss Karin, et. al.2005. Thread-Level Speculation on a CMP can be energy efficient [C].The 19th annual international conference on Supercomputing.
    Ricardo E. Gonzalez,1997. Low-power Processor Design [R]. Technical Report:CSL-TR-97-726.
    Rotenberg E., Jacobson Q.,1997. Trace processors [C]. Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO'97),138-148.
    Sarkar V, Hennessy J.,1986. Partitioning Parallel Programs for Macro-dataflow [C], In Conference Proceeedings of the 1986 ACM Conference on Lisp and Functional Programming, 192-201.
    Sazeides Y., Smith J.E.1997. The Predictability of Data Values [C]. Proceeding of the 30th Annual IEEE/ACM International Symposium on Microarchitecture.
    Scott Hamilton,1999. Taking Moore's Law into the Next Century [J]. IEEE Computer Magazine, Vol.32, No.1,43-48.
    Sean Lie.2004. Hardware Support for Unbounded Transactional Memory [D]. Masters Thesis, Massachusetts Institute of Technology.
    Serwood Timoth, Sair Suleyman, Calder Brad.2003. Phase Tracking and Prediction [C]. Proceeding of 30th Annual International Symposium on Computer Archticture.
    Shavit N., Touitou D.1995. Software Transactional Memory [C]. Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing.
    Shriraman A. et al.2008. Flexible Decoupled Transactional Memory Support [C]. Proceedings of the 35th Intl Symp on Computer Architecture.
    Sohi G. S., Breach, S. E. and Vijaykumar T. N.1995. Multiscalar processors [C]. Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA'95),414-425.
    Steffan J.G., Colohan C.B., Mowry T.C.,1997. Architectural Support for Thread-Level Data Speculation [R], Technical Report CMU-CS-97-188, School of Computer Science, Carnegie Mellon University.
    Steffan J.G., Mowry T.C.,1997. The Potential for Thread-Level Data Speculation in Tightly-Coupled Multiprocessors [R], Technical Report CSRI-TR-350, Computer Science Research Institute, University of Toronto.
    Steffan J.G., Colohan C.B., Zhai A., et al.,2000. A Scalable Approach to Thread-Level Speculation [C], Proceedings of the 27th Annual International Symposium on Computer Architecture.
    Steffan J.G., Colohan C.B., Zhai A., et al.,2002. Improving value communication for thread-level speculation[C]. High-Performance Computer Architecture,2002. Proceedings. Eighth International Symposium,65-75.
    Steffan J.G., Colohan C.B., Zhai A., et al.,2005, The STAMPede Approach to Thread-Level Speculation[J], ACM Transactions on Computer Systems,23, issue3,253-300.
    Swift M. M. et al.2008. OS Support for Virtualizing Hardware Transactional Memory [C]. Proceedings of the 3rd ACM SIGPLAN Workshop on Transactional Computing.
    Tsai J.-Y.and Yew P.-C.1996. The superthreaded architecture:Thread pipelining with run-time data dependence checking and control speculation [C]. Proceedings of the 1996 Conference on Parallel Architectures and Compilation techniques (PACT'96),35-46.
    Tremblay.M.1999. Majc:Microprocessor architecture for java computing[C]. Proceedings of HotChips'99.
    Tremblay M., Jacobson Q., and Chaudhry.S.2003. Selectively monitoring stores to support transactional program execution [R]. US Patent Application 20040187115.
    Troy A. Johnson Rudolf Eigenmann T. N. Vijaykumar.2007. Speculative Thread Decomposition Through Empirical Optimization [C]. Proceedings of the thirteenth ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP'07).
    Triolet R., Irigoin F., Feautrier P.1986. Direct parallelization of Call Statements [C]. Proceedings of the SIGPLAN'86 Symposium on Compiler Construction.
    Tullsen D.M, Eggers S.J, Levy H.M.1995. Simultaneous Multithreading:Maximizing On-chip Parallelism [C]. Proceedings of The 22nd Annual International Symposium on Computer Architecture.
    Von Praun C., Ceze L., and Cascaval C.2007. Implicit parallelism with ordered transactions [C]. Proceedings of the ACM SIGPLAN Symposium on Principles Practice of Parallel Programming.
    Welc A., Jagannathan S., and Hosking. A. L.2005. Safe futures for java [C]. Proceedings of OOPSLA,439-453.
    Wu Y., Breternitz M., Quek J., et. al.2004. The Accuracy of Initial Prediction in Two-phase Dynamic Binary Translators [C]. Proceedings of CGO'04.
    Yen L, Bobba J, Marty MR, et al.2007. LogTM-SE:Decoupling Hardware Transactional Memory from Caches [C]. In:High Performance Computer Architecture,2007. HPCA 2007. IEEE 13th International Symposium on.261-272.
    Zhai A., Colohan C.B., Steffan J.G., et al.,2002. Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads [C], Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), San Jose, CA, USA.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700