用户名: 密码: 验证码:
图形处理器通用计算的功耗分析与优化研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着大数据时代的来临,在各种计算机研究领域都需要进行大规模数据实时处理,如社会计算中的实时数据分析、网络安全中的异常流与内容检测、图像处理中的海量视频分析等。由于图形处理器GPU通用计算的大力发展,并且GPU非常适合处理数据密集型的计算任务,因此GPU或GPU集群已经成为大规模数据实时处理的重要解决方案。在实时处理大规模数据时,能量成为一种需要关注的计算资源,对计算可靠性和系统扩展性起约束作用。因此有效的能耗管理和优化成为GPU通用计算中函待解决的问题,也是绿色节能计算的要求。
     本文主要围绕GPU通用计算中能耗管理及优化这一中心目标,从GPU计算能耗测量、功耗预测、单节点并行处理策略的能耗分析到GPU集群能耗的实时控制,层次分明地研究GPU通用计算中的能耗问题。GPU能耗测量和预测是能耗管理与优化的基础,能耗优化和实时控制是整个研究的重点,在能耗优化的过程中保证计算性能和可靠性的损失最小是课题的难点。综合分析图形处理器通用计算中的关键技术,以GPU体系结构的发展趋势为线索,详细讨论通用计算编程模型、存储模型、通信模型、负载均衡等重要方面的研究内容、方法、工具,为GPU及GPU集群能耗优化和可靠性研究奠定基础。在此基础上进行了下列创新工作:
     1.提出两种GPU通用计算程序能耗预测方案,第一种方法分析中间语言PTX指令的能耗特征,根据通用计算程序的结构特点展开循环指令并统计PTX指令数,由此预估应用程序的计算能耗,这是一种简单可行、普适性强的预测方案。第二种方法则从源代码层次分析,用程序切片法分解程序并以非线性回归和小波神经网络的方法建立预测模型。该方法创新点是进一步区分应用程序的结构,分别为分支稀疏和分支稠密的应用程序建立预测模型,提高了应用程序的预测准确性。
     2.以处理大规模实时数据为背景,针对单GPU节点的计算提出一种通用性较好的并行处理策略,可以应用到各种实际算法中,文中以复杂网络聚类算法作为一种典型应用来验证并行策略的有效性。对这两种并行处理策略进行计算能耗分析,并提供了各种适合的应用场景。此外,提出解决单GPU节点计算的故障检测及容错机制,在保证计算可靠性的前提下优化能耗。
     3.提出一种针对GPU集群能耗优化控制系统,该控制系统以模型预测为核心控制策略,能够适应动态计算负载的变化,实时调整GPU能耗状态来消减计算过程中的冗余能耗。构建网络诱骗系统获得实际的网络入侵数据,并以此作为实际工程数据对能耗控制系统进行应用验证。
     4.以最大熵函数产生了组合计算性能、可靠性及能耗的综合控制指标,以此改进能耗优化控制系统,改善能耗状态调整机制对计算可靠性造成的损失。该方法突破了传统的多目标转化为单目标方法的局限性,能够正确辩识候选解的优劣,动态调整GPU集群的工作状态,使其达到计算性能最优、稳定性最好、能耗最低的目标。
With the development of Big Data the real-time processing of Large-scale streamsdata appears various application fields, such as social network analysis, abnormalstreams detection and abnormal context detection in network security and video qualityanalysis. Due to the general-purpose computing development of GPUs and the fact thatdata intensive computing is very suitable to the GPUs, both single GPU and GPUclusters have become significantly parallel computing schemes to process theLarge-scale real-time streams. Energy is an important computing resource in thereal-time processing that limits the system reliability and extensibility. So the powerconsumption management and optimization need to be solved imminently. This workbelongs to the green computing field.
     This paper mainly focuses on the power consumption management and optimization.We study the computing power consumption and optimization from power measurement,power consumption prediction, power-aware parallel strategies to GPU cluster powerconsumption control. Power measurement and prediction are the basic issue of thepower consumption management and optimization. The mainly research work is poweroptimization and real-time control and the difficult point is the tradeoff between thecomputing performance and the reliability. We firstly summarize the key techniques inGPGPU and discuss study methods and tools in program model, memory model,communication model and load balance based on the development of GPUs architecture.This work supports the power consumption optimization and research about the systemreliability. The contributions of this paper include as follows.
     1. We propose two different power consumption prediction schemes. The first one isto analyze power consumption feature from the PTX level and to count the dynamicinstruction number by unrolling the simple loop structure. This approach is simple andgeneral prediction model. The second prediction model is based on program slice fromthe program source code level. This method firstly decomposes the programs into manyslices and builds the slice prediction model by nonlinear regression and wavelet neuralnetwork. The contribution of the second model is that distinguishes the program controlstructure. And the branch-sparseness and branch-densense models are built respectivelyin order to improve the prediction accuracy.
     2. Aiming at Large-scale real data processing we propose two general parallel processstrategies on single GPU and can be applied into various algorithms. Here complex networks clustering algorithm is used to verfiy those parallel processing strategies.Additionally, we analyze the power consumption of the two different parallel strategiesand provide the application scenes. Finally, the fault detection and recovery mechanismare proposed to guarantee the system reliability.
     3. Power consumption optimization control system is designed based on the ModelPrediction control theory that may be adapted to the variation of workloads. Thiscontroller can reduce the redundancy power consumption in real-time computing. Webuild Honeynet to capture abnormal network packets to verify the validity of the powerconsumption control system.
     4. Reliability-aware power consumption controller is proposed by using maximizeentropy method to combine performance, reliability and power consumption as acomprehensive control variable. This control system reduces the reliability cost due tothe power state adjusting mechanism. This method can overcome the limitation of thetraditional approach that transforms the multi-objective function into single-objectivefunction and distinguish the solutions quality. This controller can dynamically adjust thepower state of the GPU cluster and achieve the best status in the performance, reliabilityand power consumption.
引文
[1] Yang X J, Yan X B, Xing Z C et al. Fei teng64stream processing system:architecture,compiler,and programming. IEEE Transtractions on Parallel andDistributed Systems,2009,20(8):1142-1156.
    [2]吴恩华.图形处理器用于通用计算的技术、现状及其挑战.软件学报,2004,15(10):1493-1504.
    [3]吴恩华,柳有权.基于图形处理器(GPU)的通用计算.计算机辅助设计与图形学报,2004,16(5):601-611.
    [4] Owens J, Luebke D, Govindaraju N et al. A survey of general purpose computationon graphics hardware. Computer Graphics Forum,2007,26(1):80-113.
    [5] Owens J, Houston M, Luebke D et al. GPU computing: graphics processing unitspowerful,programmable,and highly parallel are increasingly targeting general-purposecomputing applications. Proceedings of the IEEE,2008,96(5):879-899.
    [6] Dally W. J, Kapasi U J, Mattson P et al. Imagine: media processing with streams.IEEE MICRO,2008,21(2):35-46.
    [7] Kapasi J, Dally W J, Rixner S et al. The imagine stream processor. Proceedings of2002IEEE International Conference on Computer Design: VLSI in Computers andProcessors. Freiburg,Germany,2002:282-288.
    [8] Dally W J, Labonte F, Das A, Hanrahan P et al. Merrimac: supercomputing withstreams. Proceedings of the2003ACM/IEEE Conference on Supercomputing. Phoenix,Arizona, USA,2003:35-43.
    [9]伍楠,文梅,何义等.一种流处理器体系结构MASA及其在流体力学计算中的评测.计算机学报,2008,31(1):133-141.
    [10]文梅.流体系结构关键技术研究[博士学位论文].国防科学技术大学,长沙,2006.
    [11] Craig M. W, Emmett K, Prabhu Arjun. Fermi GF100GPU architecture. IEEEMICRO,2011,31(2):50-59.
    [12] Jang B, Schaa D, Mistry P et al. Exploiting memory access patterns to improvememory performance in data parallel architectures. IEEE Transtractions on Paralleland Distributed Systems,2011,22(1):105-118.
    [13] Daga M, Ashwin M A, Feng W C. On the efficacy of a fused CPU+GPUprocessor for parallel computing. Proceedings of2011Symposium on ApplicationAccelerators in High-Performance Computing. Knoxville, TN, USA,2011:141-149.
    [14] Seiler L, Carmean D, Eric S et al. Larrabee: a many-core x86architecture forvisual computing. ACM Transactions on Graphics,2008,27(3):18-33.
    [15] Hensley J. Amd ctm overview. Proceedings of Special Interest Group onComputer Graphics and Interactive Techniques (SIGGRAPH’07), San Diego,California,USA,2007:7-33.
    [16] Liu W, Lewis B, Zhou X C et al. A balanced programming model for emergingheterogeneous multicore system. Proceedings of the2nd USENIX Conference on HotTopics in Parallelism. California, USA,2010:3-6.
    [17] Buck I, Foley T, Horn D et al. Brook for GPUs: stream computing on graphicshardware. ACM Transtractions Graphics,2004,23(3):777-786.
    [18] Karrenberg R, Hack S. Improving performance of OpenCL on CPUs. Proceedingsof the21st International Conference on Compiler Construction. Tallinn, Estonia,2012:1-20.
    [19] Noaje G, Jaillet C, Krajecki M. Source-to-source code translator: OpenMP c tocuda. Proceedings of the International Conference on High Performance Computingand Communication. Banff, AB, Canada,2011:512-519.
    [20] Muller C, Frey S, Strengert M. A compute unified system architecture for graphicsclusters incorporating data locality. IEEE Transtractions Visualization and ComputerGraphics.2009,15(4):605-617.
    [21] Thies W, Karczmarek M, Amarasinghe S. Streammit: a language for streamingapplications. Proceedings of the11th international Conference Compiler Construction.Grenoble, France,2002:179-196.
    [22] Mattason P. A programming system for the imagine media processor [Ph.D.dissertation], Stanford University, California,2002.
    [23] Buck I, Foley T, Horn D et al. Brook for GPUs: stream computing on graphicshardware. ACM Transactions on Graphics,2004,23(3):777-786.
    [24] Hou Q M, Zhou K, Guo B N. Bsgp:bulk-synchronous gpu programming. ACMTransaction on Graphics,2008,27(3):19-32.
    [25] Klockner A, Pinto N, Lee Y et al. Pycuda: GPU run-time code generation forhigh-performance computing. Scientific Computing Group, Brown University,Providence, RI, USA, Technical Report:2009-40.
    [26] Yan Y H, Grossman M, Sarkar V. Jcuda: a programmer-friendly interface foraccelerating java programs with cuda. Proceedings of the15th International Euro-ParConference on Parallel Processing. Delft Univ Technol, Delft, Netherlands,2009:887-899.
    [27] Han T D. hiCUDA: high-level GPGPU programming. IEEE Transtractions onParallel and Distributed Systems,2011,22(1):78-90.
    [28] Lefohn Aaron E, Sengupta S, Kniss J et al. Glift:generic, efficient, random-accessGPU data structures. ACM Transactions on Graphics,2006,25(1):60-99.
    [29] Govindaraju N, Larsen S, Gray J et al. A memory model for scientific algorithmson graphics processors. Proceedings of the2006ACM/IEEE Conference onSupercomputing. Tampa, FL, USA,2006:89-95.
    [30] Zhang Y, Owens J D. A quantitative performance analysis model for GPUarchitectures. Proceedings of the17th IEEE International Symposium onHigh-Performance Computer Architecture(HPCA17). San Antonio, TX,2011:382-393.
    [31] Yan Y H, Zhao J S, Guo Y et al. Hierarchical place trees:A portable abstractionfor task parallelism and data movement. Proceedings of the22nd Workshop onLanguages and Complilers for Parallel Computing(LCPC). Newark, DE,2009:172-187.
    [32] He B S, Govindaraju N, Luo Q et al. Efficient gather and scatter operations ongraphics processors. Proceedings of the2007ACM/IEEE Conference onSupercomputing. Reno, NV, USA,2007:46.
    [33]韩博,周秉锋. GPGPU性能模型及应用实例分析.计算机辅助设计与图形学学报,2009,21(9):1219-1226.
    [34] Ha P H, Tsigas P, Anshus O J. The synchronization power of coalesced memoryaccesses. IEEE Transtractions on Parallel and Distributed Systems,2010,21(78):939-953.
    [35] Silberstein M, Schuster A, Geiger D et al. Efficient computation of sum-productson GPUs through software-managed cache. Proceedings of the22nd AnnualInternational Conference on Supercomputing. Kos Isl, Greece,2008:309-318.
    [36] Moerschell A, Owens J D. Distributed texture memory in a multi-GPUenvironment. Computer Graphics Forum,2008,27(1):130–151.
    [37] Gelado I, Cabezas J, Stone J E et al. An asymmetric distributed shared memorymodel for heterogeneous parallel systems. Computer Architecture News,2010,38(1):347-358.
    [38] Fatahalian K, Knight T J, Houston M et al. Sequoia: programming the memoryhierarchy. Proceedings of the2006ACM/IEEE Conference on Supercomputing.Tampa, FL, USA,2006:13.
    [39] Stuart Jeff A, Owens J D. GPU-to-CPU callbacks. Proceedings of the ThirdWorkshop on UnConventional High Performance Computing. Ischia, Italy,2010:365-372.
    [40] Lawlor O S. Message passing for GPGPU clusters: cudaMPI. Proceedings of theIEEE International Conference on Cluster Computing. New Orleans, LA, USA,2009:646-653.
    [41] Stuart J A, Owens J D. Message passing on data-parallel architectures.Proceedings of the23rd IEEE International Parallel and Distributed ProcessingSymposium. Rome, Italy,2009:918-929.
    [42] Fan Z, Qiu F, Kaufman A E. Zippy: a framework for computation andvisualization on a GPU cluster. Computer Graphics Forum,2008,27(2):341-350.
    [43] Foley T, Sugerman J. Kd-tree acceleration structures for a gpu ray tracer.Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on GraphicsHardware. Los Angeles, CA, USA,2005:15-22.
    [44] Aila T, Laine S. Understanding the efficiency fo ray traversal on GPUs.Proceedings of the Conference on High Performance Graphics2009. New Orleans,USA,2009:145-149.
    [45] Cederman D, Tsigas P. On dynamic load balance on graphics processors.Proceedings of the23rd ACM SIGGRAPH/EUROGRAPHICS Symposium onGraphics Hardware. Aire-la-Ville, Switzerland, Switzerland,2008:57-64.
    [46] Grewe D, Boyle M. A static task partitioning approach for heterogeneous systemsusing opencl. Proceedings of the20th International Conference on CompilerConstruction. Saarbrucken, Germany,2011:286-305.
    [47] Stanley T, Patney A, Owens J D. Task management for irregular-paralle workloadson the GPU. Proceedings of the High Performance Graphics. Saarbruecken, Germany,2010:29-37.
    [48] Parker S G, Bigler J, Dietrich A et al. Optix: a general purpose ray tracing engine.ACM Transactions on Graphics,2010,29(4):66.
    [49] Joselli M, Zamith M, Clua E et al. An adaptative game loop architecture withautomatic distribution of tasks between CPU and GPU. ACM Computers inEntertainment,2009,7(4):50-64.
    [50] Gregg C, Brantley J, Hazelwood K. Contention-aware scheduling of parallel codefor heterogeneous systems. Charlottesville: Department of Computer Science,University of Virginia Technical report,2010.
    [51] Jimenez V, Vilanova L, Gelado I et al. Predictive runtime code scheduling forheterogeneous architectures. Proceedings of the4th International Conference on HighPerformance Embedded Architectures and Compilers. Paphos, Cyprus,2008:19-31.
    [52] Luk C, Hong S, Kim H. Qilin:exploiting parallelism on heterogeneousmultiprocessors with adaptive mapping. Proceedings of the200942nd AnnualIEEE/ACM International Symposium on Microarchitecture. New York, NY, USA:ACM,2009:45-55.
    [53]大数据:百度百科. http://baike.baidu.com/view/6954399.htm.
    [54]鲁刚,张宏莉,叶麟. P2P流量识别.软件学报,2011,22(6):1281-1298.
    [55] Wang L, Li Z, Chen Y et al. Thwarting zero-day polymorphic worms withnetwork-level length-based signature generation. IEEE Transactions on Networking,2010,18(1):53-66.
    [56]亓开元,赵卓峰,房俊等.针对高速数据流的大规模数据实时处理方法.计算机学报,2012,35(3):477-490.
    [57] Vasiliadis G, Antonatos S, Polychronakis M et al. Gnort: High PerformanceNetwork Intrusion Detection Using Graphics Processors. Proc. RAID,2008:116-134.
    [58] Giorgos V, Sotiris I. GrAVity: a massively parallel antivirus engine. RecentAdvances in Intrusion Detection, Springer,2010:79-96.
    [59] Govindaraju N, Gray J, Manocha D. GPUTeraSort: High Performance GraphicsCoprocessor Sorting for Large Database Management. ACM SIGMOD2006.
    [60] Gurumurthi S, Sivasubramaniam A, Irwin M J et al. Using complete machinesimulation for software powerestimation: The SoftWatt approach. Proc. Int. Symp.High-Performance Computer Architecture,2002:141-150.
    [61] Xian C J, Cai L, Lu Y H. Power measurement of software programs on computerswith multiple I/O components. IEEE Transtractions on Instrumentation andmeasurement,2007,56(5):2079-2086.
    [62] Lloyd B W, Lizy K J. Complete system power estimation using processorperformance events. IEEE Transtractions on Computer,2012,61(4):563-577.
    [63] Lien C H, Bai Y W, Lin M B. Estimation by software for the power consumptionof streaming-media servers. IEEE Transtractions on Instrumentation and measurement,2007,56(5):1859-1870.
    [64] Aydin H, Melhem R, Daniel M. et al. Power-Aware scheduling for periodicreal-time tasks. IEEE Transactions on Computer,2004,53(5):584-600.
    [65] Fleischmann M. LongRun power management-Dynamic power management forcrusoe processors. Technical report, Transmeta,2001.
    [66] Liu X T, Shenoy P, Corner M D. Chameleon: Application-level powermanagement. IEEE Transactions on Mobile Computing,2008,7(8):995-1010.
    [67] Wang X R, Fu X, Liu X et al. PAUC:Power-Aware utilization control indistributed real-time systems. IEEE Transtactions on industrical informatics,2010,6(3):302-315.
    [68] Marinoni M, Buttazzo G. Elastic DVS management in processors with discretevoltage/frequency modes. IEEE Transactions on industrical informatics,2007,3(1):51-62.
    [69] Teera P, Pont M J. Reducing jitter in embedded systems employing atime-triggered software architecture and dynamic voltage scaling. IEEE Transactionson computer,2006,55(2):113-124.
    [70] Waidysooriya, Ohbayash H M, Hariyama M et al. Memory allocation exploitingtemporal locality for reducing data-transfer bottlenecks in heterogeneous multicoreprocessors. IEEE Transactions on circuits and systems for video technology,2011,21(10):1453-1465.
    [71] Shaikh M Z, Gregoire M, Li W, Wroblewski M et al. In situ power analysis ofgeneral purpose graphical processing unit. Proceedings of the19th InternationalEuromicro Conference on Parallel, Distributed and Network-Based Processing. AyiaNapa, Cyprus,2011:40-44.
    [72] Collange S, Defour D, Tisserand A. Power consumption of gpus from a softwareperspective. Proceedings of the9th International Conference on ComputationalScience. Baton Rouge, LA,2009:914-923.
    [73] Huang S, Xiao S, Feng W. On the energy efficiency of graphics processing unitsfor scientific computing. Proceedings of the2009IEEE International Symposium onParallel and Distributed Processing. Rome, Italy,2009:1-8.
    [74] Jiao Y, Lin H, Balarji P et al. Power and performance characterization ofcomputational kernel on the GPU. Proceedings of the IEEE/ACM Int’l Conference onGreen Computing and Communications&Int’l Conference on Cyber, Physical andSocial Computing. Hangzhou, China,2010:221-228.
    [75] Mudge T. Power:A first class design constraint for future architectures. IEEEcomputer,2001,34(4):52-58.
    [76] Woo D H, Lee H S. Extending Amdahl’s law for energy-efficient computing in themany-core era. Computer,2008,41(12):24-31.
    [77] Ma X H, Dong M, Zhong L et al. Statistical power consumption analysis andmodeling for GPU based computing. Proceedings of the ACM Workshop on PowerAware Computing and Systems. Big Sky, Montana,2009:267-271.
    [78] Hong S, Kim H. An integrated GPU power and performance model. Proceedingsof the37th Annual International Symposium on Computer Architecture. Saint-Malo,France,2010:280-289.
    [79] Wang H F, Chen Q K. An energy consumption model for GPU computing atinstruction level. International Journal of Advancements in Computing Technology,2012,4(2):192-200.
    [80] Wang H F, Chen Q K. Power estimating model and analysis of generalprogramming on GPU. Journal of Software,2012,7(5):1164-1170.
    [81] Wang G B, Lin Y S, Yi W. Kernel fusion: an effective method for better powerefficiency on multithreaded GPU. Proceedings of IEEE/ACM Int’l Conference onGreen Computing and Communications&Int’l Conference on Cyber, Physical andSocial Computing. Hangzhou, China,2010:344-349.
    [82] Hong S, Kim H. An analytical model for a GPU architecture with memory-leveland thread-level parallelism awareness. In Proceedings of the36th AnnualInternational Symposium on Computer Architecture. Austin, TX, USA,2009:152-163.
    [83]林一松,杨学军,唐滔等.一种基于并行度分析模型的GPU功耗优化技术.计算机学报,2011,34(4):705-716.
    [84]林一松,杨学军,唐滔等.一种基于关键路径分析的CPU-GPU异构系统综合能耗优化方法.计算机学报,2012,35(1):123-133.
    [85]王桂彬,杨学军,唐滔等.异构并行系统能耗优化分析模型.软件学报,2012,23(6):1382-1396.
    [86] Gebhart M, Johnson D R, Tarjan D et al. Energy-efficient mechanisms formanaging thread context in throughput processors. Proceedings of the38thInternational Symposium on Computer Architecture. San Jose, California, USA,2011:235-246.
    [87] Rong G, Xi Z Feng, Shuai W S et al. PowerPack: Engergy profiling and analysisof high-performance systems and applications. IEEE Transactions on parallel anddistributed systems,2010,21(5):658-670.
    [88] Koller R, Verma A, Neogi A. WattApp: An application aware power meter forshared data centers. In Proc. ACM Int’l Conf. Autonomic Computing,2010, pp.31-40
    [89] John D. Davis, Suzanne R et al. Including variability in large-scale cluster powermodels. IEEE computer architecture letters,2012,99:1-5.
    [90] Lien C H, Bai Y W. Web Server power estimation, modeling and management.Proc14th Int Conf Networks. Singapore. IEEE,2006:1-6.
    [91] Tibor H, Tarek A, Kevin S, Liu X. Dynamic voltage scaling in multitier webservers with end-to-end delay control. IEEE Transactions on computers,2007,56(4):444-458.
    [92] Wang X R, Wang Y F. Coordinating power control and performance managementfor virtualized server clusters. IEEE Transactions on parallel and distributed systems,2011,22(2):245-259.
    [93] Wang X R, Wang Y F, Chen M et al. PARTIC: Power-Aware Response timecontrol for virtualized web servers. IEEE Transactions on parallel and distributedsystems,2011,22(2):323-336.
    [94]李新,贾智平,鞠雷等.一种面向同构集群系统的并行任务节能调度优化方法.计算机学报,2012,35(3):591-601.
    [95] Wang L P, Lu Y. An efficient threshold-based power management mechanism forheterogeneous soft real-time clusters. IEEE Transactions on industrical informatics,2010,6(3):352-364.
    [96] Showerman M, Enos J, Steffen C, Treichler S et al. EcoG: A power-efficientGPU cluster architecture for scientific computing. Computing in Science&Engineering,2011,13(2):83-87.
    [97] Ren D Q, Eric B. Sergey P et al. Power aware parallel3-D Finite Element MeshRefinement performance modeling and analysis with CUDA/MPI on GPU andMulti-Core architecture. IEEE Transactions on magntics,2012,48(2):335-338.
    [98] Liu W J, Du Z H, Xiao Y et al. A Waterfall model to achieve energy efficient tasksmapping for large scale GPU clusters. Processing Symposium of IEEE InternationalParallel&Distributed,2011.
    [99] Dimitrov M, Mantor M, Zhou H Y. Understanding software approaches forGPGPU reliability. Proceedings of the2nd workshop on General Purpose Processingon Graphics Processing Units. New York, NY, USA,2009:94-104.
    [100] Maruyama N, Nukada A, Mastsuoka S. A high-performance fault-tolerantsoftware framework for memory on commodity GPUs. Proceedings of the2010IEEEInternational Symposium on Parallel and Distributed Processing. Atlanta, GA,2010:1-12.
    [101] Haque Imran S, Pande Vijay S. Hard data on soft errors: a large-scale assessmentof real-world error rates in GPGPU. Proceedings of the10th IEEE/ACM InternationalConference on Cluster, Cloud and Grid Computing. Melbourne, VIC, Australia,2010:691-697.
    [102] Jing W T, Goswami N, Li T et al. Analyzing Soft-Error Vulnerability on GPGPU.In Proc. Microarchitecture2011,2011.
    [103] Sheaffer J, Luebke D, Skadron K. A hardware redundancy and recoverymechanism for reliable scientific computation on graphics processors. Proceedings ofACM SIGGRAPH/Eurographics Workshop on Graphics Hardware. San Diego, CA,2007:55-64.
    [104] Gregerson A. E. Abhyankar A. V. Performance cost analysis ofsoftware-implemented hardware fault tolerance methods in general-purpose GPUcomputing.2009. http://homepages/cae.wisc.edu/~ece753/papers/Paper_4.pdf.
    [105]徐新海,杨学军,林宇斐等.一种面向CPU-GPU异构系统的容错方法.软件学报,2011,22(10):2538-2552.
    [106] Xu X H, Lin Y F, Tang T et al. HiAL-Ckpt: A hierarchical application-levelcheckpointing for CPU-GPU hybrid system. Proceedings of the5th InternationalConference on Computer Science&Education. Heifei, China,2010:1895-1894.
    [107] Zhao B X, Aydin H, Zhu D. On Maximizing Reliability of Real-Time EmbeddedApplications under Hard Energy Constraint. IEEE Transactions on IndustrialInformatics,2010,6(3):316–328.
    [108] Zhao B X, Aydin H, Zhu D. Energy Management under General Task-LevelReliability Constraints. Proceedings of2012IEEE18th Real-Time and EmbeddedTechnology and Applications Symposium.2012.
    [109] Ryoo S, Rodrigues C I, Stone S et al. Program optimization space pruning for amultithread GPU. Proceedings of the sixth Annual IEEE/ACM InternationalSymposium on Code Generation and Optimization,2008:195-204.
    [110] NVIDIA, NVIDIA Compute PTX: Parallel Thread Execution,1st ed, NVIDIACorporation, Santa Clara, California, October2008.
    [111] NVIDIA_Corporation, CUDA_3.0Programming Guide,2010,http://www.nvidia.com/(accessed May2010).
    [112] Christodorescu M, Somesh J. Static Analysis of Executables to DetectMalicious Patterns. Proceedings of the12th USENIX Security Symposium,2003:12-12.
    [113] Steve C, Kennedy K. Improving the ratio of memory operations to floating-pointoperations in loop. ACM Transactions on Programming Languages and Systems,16:1768-1810,1994.
    [114] Vivek S. Optimized unrolling of nested loops. International Journal ParallelProgram,2001,29(5):545-581.
    [115] Senn E, Laurent J. Julien N et al. SoftExplorer: Estimating and optimizing thepower and energy consumption of a C program for DSP applications. ERASIP Journalon Applied Signal Proceeing,2641-2654.2005.
    [116] Baskaran M M, Bondhugula U, Krishnamoorthy S et al. A compiler frameworkfor optimization of affine loop nests for GPGPUs. Proceedings of the22nd AnnualInternational Conference on Supercompting,2008:225-234.
    [117] Liu Y X, Zhang E Z, Shen X P. A cross-input adaptive framework for gpuprogram optimizations. Intl Symposium Parallel and Distributed Processing,2009:1-10.
    [118] Andrew K, Gregory D, Sudhakar Y. A characterization and analysis of PTXkernels. Proc International Symposium on Workload Characterization. Austin:IEEE,2009:3-12.
    [119] Jeremy E, Craig S, Joshi F et al. Quantifying the Impact of GPUs onPerformance and Energy Efficiency in HPC Clusters. Proc2010Int Green ComputingConf. Chicago:IEEE,2010:317-324.
    [120] Patrick K. Green Computing. Communications of the ACM,2008,51(10):11-13.
    [121] Zhong B, Feng M, Lung C H. A Green Computing Based ArchitectureComparison and Analysis. IEEE/ACM Int’l Conference on Green Computing andCommunications&Int’l Conference on Cyber, Physical and Social Computing.pp:386-391,2010.
    [122]林闯,田源,姚敏.绿色网络和绿色评价:节能机制、模型和评价.计算机学报,2011,34(4):593-611.
    [123] Leon A S, Langley B, Shin J L. The UltraSPARC T1Processor: CMT Reliability.Proceedings of IEEE Custom Integrated Circuits Conference,2007:555-562.
    [124] Amtoft T. Slicing for Modern Program Structures: A Theory for EliminatingIrrelevant Loops. Information Processing Letters,2008,106(2):45-51.
    [125] Martin P. W, Hussein Z. Deriving a Slicing Algorithm via FermaTTransformations. IEEE Transaction on Software Engineering,2011,37(1):24-47.
    [126] Pharr M, Fernando R. GPU Gems2[M]. Boston: Addison Wesley,2005:493-495.
    [127] Hurvich C M, Tsai C L. Regression and time series model selection in smallsamples. Biometrika,1989,76(2):297-307.
    [128] Zhang Q H, Benveniste A. Wavelet networks. IEEE Transaction on NeuralNetworks,1992,3(11):889-898.
    [129] Parboil benchmark suite. http://impact.crhc.illinois.edu/parboil.php.
    [130] Che S, Boyer M, Meng J, Tarjan D et al. Rodinia: A benchmark suite forheterogeneous computing. IEEE International Symposium on WorkloadCharaterization,2009.
    [131]王大伟,窦勇,李思昆.核心循环到粗粒度可重构体系结构的流水化映射.计算机学报,2009,32(6):1089-1098.
    [132] Henessy J L, Patterson D A. Computer Architecture: A Quantitative Approach.Morgan kaufmann,2003.
    [133] Clark D, Hunt S, Malacaria P. Quantified interference for a while language.Electronic Notes in Theoretical Computer Science,2005,112:149-166.
    [134] Jeremy S. Towards Probabilistic Program Slicing. In Beyond Program Slicing,Dagstuhl Seminar Proceedings. July,2006.
    [135]王璐璐,李必信,周晓宇.全路径剖析方法.软件学报,2012,23(6):1413-1428.
    [136] Moecke M, Seara R. Sorting Rates in Video Encoding Process for ComplexityReduction. IEEE Transactions on Circuits and Systems for Video Technology,2010,20(1):88-101.
    [137] IBM:应对大数据挑战的十大绝技[OL]http://tech.ccidnet.com/art/27231/20111128/3235401_1.html,2011,10.
    [138] Nickolls J, Buck I, Garland M et al. Scalable parallel programming with CUDA.Queue,2008,6(2):40-53.
    [139] Moorthy A K, Seshadrinathan K. Wireless Video Quality Assessment: A Study ofSubjective Scores and Objective Algorithms. IEEE Transaction on Circuits andSystems for Video Technology,2010,20(4):587-599.
    [140] Manoranjan P, Lin W, Tong C L et al. Direct Intermode Selection for H.264Video Coding Using Phase Correlation. IEEE Transaction on Image Processing,2011,20(2):461-473.
    [141] Randy S, Neelam G, Ormont J, SanK K et al. Evaluating GPUs for NetworkPacket Signature Matching. IEEE International Symposium on Performance Analysisof Systems and Software,2009:175-184.
    [142]唐晋韬,王挺,王戟.适合复杂网络分析的最短路径近似算法.软件学报,2011,22(10):2279-2290.
    [143]杨博,刘大有, LIU Jinming等.复杂网络聚类方法.软件学报,2009,20(1):54-66.
    [144]骆志刚,丁凡,蒋晓舟,等.复杂网络社团发现算法研究新进展.国防科技大学学报,2011,33(1):47-52.
    [145]何东晓,周栩,王佐等.复杂网络社区挖掘——基于聚类融合的遗传算法.自动化学报,2010,36(8):1160-1170.
    [146]黄发良,肖南峰.基于线图与PSO的网络重叠社区发现.自动化学报,2011,37(9):1140-1144.
    [147]黄发良,肖南峰.网络社区发现的粒子群优化算法.控制理论与应用,2011,28(9):1135-1141.
    [148]陈国强,王宇平.基于极值优化模块密度的复杂网络社区检测.华中科技大学学报(自然科学版),2011,39(4):82-85.
    [149] Newman M J. Fast algorithm for detecting community structure in networks.Physical Review E,2004,69(6):66-76.
    [150] Stanley T. Patney A, Owens J D. Task management for irregular-paralleworkloads on the GPU. Proceedings of High Performance Graphics. EurographicsAssociation, Saarbruecken, Germany:IEEE,2010:29-37.
    [151] Repantis T, Gu X H, Kalogeraki V. Qos-Aware Shared Component Compositionfor Distributed Stream Processing System. IEEE Transaction on Parallel andDistributed System,2010,20(7):968-982.
    [152] Wang X R, Chen M, Fu X. MIMI Power Control for High-Density Servers in anEnclosure. IEEE Transaction on Parallel and Distributed System,2010,21(10):1412-1426.
    [153] Tang Q H, Kumar S. Georgios V. Energy-Efficient Thermal-Aware TaskScheduling for Homogeneous High-Performance Computing Data Centers: ACyber-Physical Approach. IEEE Transaction on Parallel and Distributed System,2008,19(11):1458-1472.
    [154] Wang Z, McCarthy C, Zhu X et al. Feedback Control Algorithms for PowerManagement of Servers. Proceedings of Int’l3th Workshop Feedback ControlImplementation and Design in Computing Systems and Networks,2008.
    [155]曾静,薛定宇,袁德成.分布式模型预测控制方法的研究.系统仿真学报,2008,20(21):5898-5902.
    [156] The Honeynet Project.[EB/OL]http://www.honeynet.org,2007.
    [157] Kuwatly Z, Masri M S, Artail H. A dynamic honeypot design for intrusiondetection.ACS/IEEE Int’1Conf on Pervasive Services(ICPS2004), Beirut, Lebanon,2004.
    [158]马莉波,段海新,李星,蜜罐部署分析.大连理工大学学报,2005,45(10)suppl:150-155.
    [159]马莉波,李星,张亮.有效扫描监测系统建模与部署.软件学报,2009,20(4):845-857.
    [160]王海峰,陈庆奎,陈新疆.动态蜜网的建模与部署.小型微型计算机系统,2011,32(7):1287-1292.
    [160]王海峰,陈庆奎.蜜网智能动态部署算法研究.计算机应用研究,2011,28(3):1119-1121.
    [162] Srinivasan S, Jha Nk. Safety and reliability driven task allocation in distributedsystems. IEEE Transaction on Computers,2006,55(7):864-879.
    [163]朱晓敏,祝江汉,马满好.异构集群系统中具有Qos需求的实时任务容错调度.软件学报,2011,22(7):1440-1456.
    [164]公茂果,焦李成,杨咚咚等.进化多目标优化算法研究.软件学报,2009,20(2):271-289.
    [165]陈彦萍,张建科,孙家泽等.一种基于混合智能优化的服务选择模型.计算机学报,2010,33(11):2116-2124.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700