非结构网格下稀疏下三角方程求解器众核优化技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

非结构网格下稀疏下三角方程求解器众核优化技术研究

详细信息查看全文 | 推荐本文 |

英文篇名：Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids
作者：倪鸿 ; 刘鑫
英文作者：NI Hong;LIU Xin;National Research Centre of Parallel Computer Engineering and Technology;
关键词：稀疏下三角方程求解器 ; 非结构网格 ; SW26010 ; 异构众核优化 ; 并行算法
英文关键词：SpTRSV;;Unstructured-grids;;SW26010;;Heterogeneous many-core Optimization;;Parallel algorithm
中文刊名：JSJA
英文刊名：Computer Science
机构：国家并行计算机工程技术研究中心;
出版日期：2019-06-15
出版单位：计算机科学
年：2019
期：v.46
基金：“全球变化和应对”专项(2016YFA0602200)资助
语种：中文;
页：JSJA2019S1111
页数：5
CN：S1
ISSN：50-1075/TP
分类号：528-532

摘要

稀疏下三角方程求解器(SpTRSV)作为基础线性代数库中一个重要的算法,在大规模科学计算中有着广泛应用。在非结构网格中,由于非结构网格具有数据存储无序性、数据强相关性以及频繁地离散访存等特点,该算法在众核架构上难以实现有效的并行。文中基于国产异构众核处理器SW26010体系结构的特点,针对非结构网格计算,提出了一种基于流水线串行-局部并行思想的通用众核优化方法。该方法能够有效减少非结构网格计算中的随机访存,提高计算效率,并且具有很好的扩展性。基于该算法对多个实际应用算例进行众核优化,实验结果表明:该方法能够实现单核组3倍以上的加速,显著降低了运行时间。
Sparse Triangular Solver(SpTRSV),as an important algorithm in basic linear algebraic library,has been widely used in large-scale scientific computing.In unstructured-grids,because unstructured grid have the characte-ristics of data storage disorder,data depth correlation and frequent discrete-time memory access,this algorithm is difficult to achieve effective parallelism in the many-core architecture.In this paper,based on the architecture of the domestic heterogeneous multiprocessor SW26010 architecture,a general kernel optimization method based on pipelined serial and local parallel was proposed for unstructured grid computing.This method can effectively reduce random access in unstructured grid computing,improve the computing efficiency,and have the good scalability.Based on this algorithm,multiple kernel optimization is carried out for several practical applications.The experimental results show that the method can achieve more than 3 times acceleration of the single core group and significantly reduce the running time.

引文

[1] ANDERSON E,SAAD Y.Solving sparse triangular linear systems on parallel computers[J].International Journal of High Speed Computing,1989,1(1):73-95.
    [2] DUFF I S,ERISMAN A M,REID J K.Direct Methods for Sparse Matrices[J].Matehematics of Computation,1986,52(185):250.
    [3] SAAD Y.Iterative methods for sparse linear systems[OL].http://ieeexplore.ieee.org/ielx5/99/12132/001231631.pdf.
    [4] REGULY I Z,MUDALIGE G R,BERTOLLI C,et al.Acceleration of a full-scale industrial cfd application with op2[J].IEEE Transactionson Parallel and Distributed Systems,2016,27(5):1265-1278.
    [5] YAO C,YANG X,WANG W,et al.26 pflops stencil computations for atmospheric modeling on sunway taihulight[C]//2017 IEEE International Parallel and Distributed Processing Sympo-sium(IPDPS).IEEE,2017:535-544.
    [6] CHEN T,LI M,LI Y,et al.Mxnet:A flexible and efficient machine learning library for heterogeneous distributed systems[J].arXiv preprint arXiv:1512.01274,2015.
    [7] ANDERSON E,SAAD Y.Solving sparse triangular linear systems on parallel computers[J].International Journal of High Speed Computing,1989,1(1):73-95.
    [8] KALIKAR S,NASRE R.Domlock:A new multi-granularity locking technique for hierarchies[C]//Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,2016.
    [9] SCHREIBER R,TANG W P.Vectorizing the conjugate gradient method[D].Los Angeles:Stanford University,1982.
    [10] LIU W,LI A,HOGG J D,et al.Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides[J].Concurrency and Computation:Practice and Experience,2017,29(21):e4244-n/a.
    [11] PICCIAU A,INGGS G E,WICKERSON J,et al.Balancing locality and concurrency:Solving sparse triangular systems on gpus[C]//2016 IEEE 23rd International Conference on High Performance Computing(HiPC).2016:183-192.
    [12] 敖玉龙.国产大型众核系统上稀疏矩阵和Stencil运算的性能优化关键技术研究[D].北京:中科院软件研究所,2017:27-47.
    [13] 王欣亮.关键稀疏数值计算核心在国产众核架构上的性能优化研究[D].北京:清华大学,2018:24-43.
    [14] QI F B.Sunway TaihuLight super computer[J].Communications of the CCF,2017,13(10):16-22.
    [15] ANDERSON W K,GROPP W D,KAUSHIK D K,et al.Achieving High Sustained Performance in an Unstructured Mesh CFD Application[C]//Conference on High Performance Computing (Super Computing).1999.
    [16] 林恒.基于超大规模异构体系结构的图计算系统研究[D].北京:清华大学,2018:23-53.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700