基于GPU的混合精度平方根共轭梯度算法
详细信息 本馆镜像全文    |  推荐本文 | | 获取馆网全文
摘要
针对当前基于GPU的数值算法具有双精度数据性能低下的缺陷。提出了一种适于GPU统一计算架构Fermi-CUDA的混合精度平方根共轭梯度算法用以求解稀疏线性方程组。该算法采用单精度内迭代与双精度外迭代结合的方法,以充分利用GPU体系结构下单精度高性能和双精度高精度的优点。整个算法的计算部分完全在GPU端进行,减少了CPU和GPU之间的数据通信。实现了基于GPU的平方根共轭梯度法、Jacobi迭代法和Gauss-Seidel迭代法,分析它们作为内迭代算子对算法收敛性的影响。实验表明,该算法获得了与全双精度数据处理等同的计算精度,比GPU全双精度在浮点性能上提升近一倍,相对于CPU全双精度串行算法,最大加速比达到70以上。
GPU-based numerical algorithms have the shortcoming of low performance for double precision.We suggest a mixed precision conjugate gradient squared algorithm suitable for the GPU of Fermi-CUDA to solve sparse linear equations.The scheme uses a combination of single-precision inner iteration and double-precision outer iteration to take the advantages of efficient single-precision operation and accurate double-precision operation under the GPU structure.The calculation of the algorithm is implemented entirely on the GPU,which reduces the data transfer between CPU and GPU.Conjugate gradient squared algorithm,Jacobi iteration method and Gauss-Seidel iteration method based on GPU are implemented;and as inner iteration operators,their influence on the convergence of the whole process is analyzed.Experiments indicate that the mixed precision scheme maintains the native double-precision accuracy of data processing.At the same time,the floating point accuracy is improved by a factor of 2 compared with that using double-precision alone,and the maximum speedup ratio reaches to more than 70.
引文
[1]SONNEVELD P.CGS,a fast Lanczos-type solver fornonsymmetric linear systems[J].Sci.Stat Comput.,1989,10:36-52.
    [2]YOUSEF S.Iterative methods for sparse linear systems[M].Boston:PWS,1996.
    [3]蔡大用,陈玉荣.用不完全LU分解预处理的不精确潮流计算方法[J].电力系统自动化,2002,26(8):11-14.CAI D Y,CHEN Y R.Solving power flow equations withinexact Newton methods preconditioned by incomplete LUfactorization withpartially fill-in[J].Automation of Elec-tric Power Systems,2002,26(8):11-14.
    [4]李晓梅,吴建平.Krylov子空间方法及其并行计算[J].计算机科学,2005,32(1):19-20,40.LI X M,WU J P.Krylov subspace methods and parallelcomputation[J].Computer Science,2005,32(1):19-210,40.
    [5]黄海宏,赵哲源,何著.具备主动维护功能的分布式电池管理系统的研究[J].电子测量与仪器学报,2010,24(3):283-288.HUANG H H,ZHAO ZH Y,HE ZH.Study of distribu-ted battery management system with f-unction of activemaintenance[J].Journal of Electronic Measurement andInstrument,2010,24(3):283-288.
    [6]王刚,乔纯捷,王跃科.基于时钟同步的分布式实时系统监控[J].电子测量与仪器学报,2010,24(3):274-278.WANG G,QIAO CH J,WANG Y K.Distributed realtime system monitor based on clock synchronization[J].Journal of Electronic Measurement and Instrument,2010,24(3):274-278.
    [7]LIU W G,SCHMIDT B,VOSS G,et al.Molecular dy-namics simulations on commodity GPUs with CUDA[C].Lecture Notes in Computer Science,High PerformanceComputing-HiPC,2007.
    [8]BELLEMAN R,BEDORF J,PORTEGIES-ZWART S F.High performance direct gravitational N-body simulationson graphics processing units II:An implementation inCUDA[J].New Astronomy,2008,13(2):103-112.
    [9]ANDERSON J,LORENZ C D,TRAVESSET A.Generalpurpose molecular dynamics simulations fully implemen-ted on graphics processing units[J].Journal of computa-tional physics,2008,227(10):5342-5359.
    [10]陈孝良,程晓斌,叶青华,等.基于GPU的多通道倍频程并行算法研究[J].仪器仪表学报,2010,31(7):1674-1680.CHEN X L,CHENG X B,YE Q H,et al.Study on par-allel algorithm of multi-channel octave analysis based onGPU[J].Chinese Journal of Scientific Instrument,2010,31(7):1674-1680.
    [11]白洪涛,欧阳丹彤,何丽莉.一种基于图形处理器的频繁模式挖掘算法[J].仪器仪表学报,2009,30(10):2082-2087.BAI H T,OUYANG D T,HE L L.GPU-based frequentpattern mining algorithm[J].Chinese Journal of Scientif-ic Instrument,2009,30(10):2082-2087.
    [12]刘国峰,刘洪,李博,等.山地地震资料叠前时间偏移方法及其GPU实现[J].地球物理学报,2009,52(12):3101-3108.LIU G F,LIU H,LI B,et al.Method of prestack timemigration of seismic data of mountainous regions and itsGPU implementation[J].Chinese Journal Geophysics,2009,52(12):3101-3108.
    [13]李博,刘红伟,刘国峰,等.地震叠前逆时偏移算法的CPU/GPU实施对策[J].地球物理学报,2010,53(12):2938-2943.LI B,LIU H W,LIU G F,et al.Computational strategyof seismic pre-stack reverse time migration on CPU/GPU[J].Chinese Journal Geophysics,2010,53(12):2938-2943.
    [14]BOLZ J,FARMER I,GRINSPUN E,et al.Sparse ma-trix solvers on the GPU:Conjugate gradients and multig-rid[J].ACM Transactions on Graphics,2003(22):917-924.
    [15]JEONG W K,WHITAKER R T.A fast iterative methodfor a class of hamilton-jacobi equations on parallel sys-tems[R].University of Utah-Technical Report UUCS-07-010,2007.
    [16]SAAD Y.SPARSEKIT:A basic tool for sparse matrixcomputation[R].Tech.Rep.CSRD-TR 1029,Univ.of Illinois,Urbana,IL,1990.
    [17]University of Florida Sparse Matrix Collection.http://www.cise.ufl.edu/research/sparse/matrices[DB/DL].2011
    [18]白洪涛,欧阳丹彤,李熙铭,等.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,37(8):168-171,181.BAI H T,OUYANG D T,LI X M,et al.Optimizingsparse matrix-vector multiplication based on GPU[J].Computer Science,2010,37(8):168-171,181.

版权所有:© 2023 中国地质图书馆 中国地质调查局地学文献中心