基于PMVS算法的大规模数据细粒度并行优化方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息查看全文 | 推荐本文 |

英文篇名：Fine-Grained Parallel Optimization of Large-Scale Data for PMVS Algorithm
作者：刘金硕 ; 李扬眉 ; 江庄毅 ; 邓娟 ; 眭海刚 ; Pan ; Jeff
英文作者：LIU Jinshuo;LI Yangmei;JIANG Zhuangyi;DENG Juan;SUI Haigang;PAN Jeff;School of Cyber Science and Engineering, Wuhan University;School of Computer Science, Technical University of Munich;School of Computer Science, Wuhan University;State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing,Wuhan University;Department of Computing Science, University of Aberdeen;
关键词：CPUs＿GPUs多粒度并行 ; GPU并行优化 ; CUDA ; 负载均衡 ; 存储与通信优化 ; 图像处理
英文关键词：CPUs_GPUs multi-granularity parallel;;GPU parallel optimization;;CUDA;;load balancing;;storage and communication optimization;;image processing
中文刊名：WHCH
英文刊名：Geomatics and Information Science of Wuhan University
机构：武汉大学国家网络安全学院;慕尼黑理工大学计算机学院德国慕尼黑;武汉大学计算机学院;武汉大学测绘遥感信息工程国家重点实验室;阿伯丁大学计算机学院;
出版日期：2019-01-07 16:50
出版单位：武汉大学学报(信息科学版)
年：2019
期：v.44
基金：国家自然科学基金(61672393,U1536204)~~
语种：中文;
页：WHCH201904019
页数：9
CN：04
ISSN：42-1676/TN
分类号：137-145

摘要

三维多视角立体视觉算法(patch-based multi-view stereo, PMVS)以其良好的三维重建效果广泛应用于数字城市等领域,但用于大规模计算时算法的执行效率低下。针对此,提出了一种细粒度并行优化方法,从任务划分和负载均衡、主系统存储和GPU存储、通信开销等3方面加以优化;同时,设计了基于面片的PMVS算法特征提取的GPU和多线程并行改造方法,实现了CPUs_GPUs多粒度协同并行。实验结果表明,基于CPU多线程策略能实现4倍加速比,基于统一计算设备架构(compute unified device architecture, CUDA)并行策略能实现最高34倍加速比,而提出的策略在CUDA并行策略的基础上实现了30%的性能提升,可以用于其他领域大数据处理中快速调度计算资源。
We address the problem of fine-grained parallel optimization of large-scale data. Patch-based multi-view stereo(PMVS) algorithm has been widely applied to digital city and other fields because of its good three-dimensional reconstruction effect, however, its large-scale computing algorithm has a low execution efficiency. Therefore, to address the limitation, this paper proposes a fine-grained parallel optimization method, including task allocation and load-balancing; strategies of main system memory and GPU memory; the optimization of communication. We perform CPU multi-threading operation using the pthreads function library to take full advantage of the computing power of multi-core CPUs. And for GPUs, we utilize the CUDA framework while optimizing thread organization and memory access. Besides that, we propose the idea of adapting memory pool model and pipelining model to improve bandwidth availability ratio. The memory pool model reduces the impact of data resources transferring on the bus for CPUs_GPUs while waiting for resources; the pipelining model hides communication time for CPU to read data from memory. At the same time, this paper utilizes the Harris-DOG feature extraction of PMVS algorithm of sequences of images as the example to verify our optimization strategies. The experiments demonstrate that the multi-threading CPU-based strategy can achieve 4 times speed-up ratio, the highest ratio that parallel CUDA-based strategy can achieve is 34 times, and our strategy can improve the performance 30% on the basis of the parallel CUDA-based strategy. In the future, our optimization strategy can be applied to quick computing resource scheduling in big data processing of other domains.

引文

[1] Yu Ming, Qi Feifei, Yu Yang, et al. 3D Reconstruction Algorithm Based on Multi-view Stereo[J]. Computer Engineering and Design, 2013, 34(2): 730-733 (于明, 齐菲菲, 于洋, 等. 基于立体视觉的三维重建算法[J]. 计算机工程与设计, 2013, 34(2): 730-733)
    [2] Liu Jinshuo, Jiang Zhuangyi, Xu Yabo, et al. Multithread and GPU Parallel Schema on Patch-Based Multi-view Stereo Algorithm[J]. Computer Science, 2017, 44(2): 296-301 (刘金硕, 江庄毅, 徐亚渤, 等. PMVS算法的CPU多线程和GPU两级粒度并行策略[J]. 计算机科学, 2017, 44(2): 296-301)
    [3] Xiao Han, Zhou Qinglei, Zhang Zuxun. Parallel Algorithm of Harris Corner Detection Based on Multi-GPU[J]. Geomatics and Information Science of Wuhan University, 2012, 37(7): 876-881 (肖汉, 周清累, 张祖勋. 基于多GPU的Harris角点检测并行算法[J]. 武汉大学学报·信息科学版, 2012, 37(7): 876-881)
    [4] Zhang H,Xie Y, Heng P A. Accelerating Feature Extraction for Patch-based Multi-view Stereo Algorithm[C]. International Conference on Computer Design and Applications,Qinhuangdao,China,2010
    [5] Xiao Han. Research on High Efficiency Heterogeneous Parallel Computing Based on CPU+GPU in Image Matching[D].Wuhan: Wuhan University, 2011 (肖汉. 基于CPU+GPU的影像匹配高效能异构并行计算研究[D]. 武汉: 武汉大学, 2011)
    [6] Liu Jinshuo, Cheng Li, Wang Lina, et al. 3D Visua- lization of Shear Wave Data Based on CUDA[J]. Geomatics and Information Science of Wuhan University, 2013, 38(11): 1 271-1 275 (刘金硕, 程力, 王丽娜, 等. 利用CUDA的剪切波数据三维可视化[J]. 武汉大学学报·信息科学版, 2013, 38(11): 1 271-1 275)
    [7] Liu Jinshuo, Deng Juan, Zhou Zheng, et al. Parallel Programming Based on CUDA[M]. Beijing: Science Press, 2014: 31-32, 92-94 (刘金硕, 邓娟, 周峥, 等. 基于CUDA设计[M].北京: 科学出版社, 2014: 31-32, 92-94)
    [8] Romerolaorden D, Villazonterrazas J, Martinez- graullera O, et al. Analysis of Parallel Computing Strategies to Accelerate Ultrasound Imaging Processes[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27: 3 429-3 440
    [9] Fang Xudong. Research on CPU-GPU Heteroge- neous Parallel Technology for Large-Scale Scientific Computing[D]. Changsha: National University of Defense Technology, 2009 (方旭东. 面向大规模科学计算的CPU-GPU异构并行技术研究[D]. 长沙: 国防科学技术大学, 2009)
    [10] Ilic A, Sousa L. Collaborative Execution Environment for Heterogeneous Parallel Systems[C]. IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, Atlanta, USA, 2010
    [11] Lee J,Samadi M, Park Y, et al. Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems[C]. The 22nd International Conference on Parallel Architectures and Compilation Techniques, Edinburgh, UK, 2013
    [12] Ohshima S, Kise K, Katagiri T, et al. Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment[C].The 7th International Meeting on High Performance Computing for Computational Science, Rio de Janeiro, Brazil, 2006
    [13] Pei Songwen, Ning Jing, Zhang Junge. Dynamic Task Scheduling Algorithm Based on CPU-GPU Heterogeneous Multi-core System[J]. Application Research of Computers, 2016, 33(11): 3 315-3 319 (裴颂文, 宁静, 张俊格. CPU-GPU异构多核系统的动态任务调度算法[J]. 计算机应用研究, 2016, 33(11): 3 315-3 319)
    [14] Heldens S, Varbanescu A L, Iosup A. Dynamic Load Balancing for High-Performance Praph Processing on Hybrid CPU-GPU Platforms[C]. The 6th Workshop on Irregular Applications: Architectures and Algorithms, Salt Lake City, USA, 2016
    [15] Yaseen A, Ji H, Li Y H. A Load-Balancing Workload Distribution Scheme for Three-Body Interaction Computation on Graphics Processing Units(GPU)[J]. Journal of Parallel and Distributed Computing, 2016, 87: 91-101
    [16] Wan L J, Li K L, Liu J, et al. Efficient CPU-GPU Cooperative Computing for Solving the Subset-Sum Problem[J]. Concurrency and Computation Practice and Experience, 2016, 28(2): 492-516
    [17] Yu C D, Wang W. Performance Models and Workload Distribution Algorithms for Optimizing a Hybrid CPU-GPU Multifrontal Solver[J].Compututers and Mathematics with Applicatons, 2014, 67(7): 1 421-1 437
    [18] Shehab E, Algergawy A, Sarhan A. Accelerating Relational Database Operations Using Both CPU and GPU Co-processor[J]. Computers and Electrical Engineering, 2017, 57: 69-80
    [19] Chan L M, Srinivasan R. A Hybrid CPU-Graphics Processing Unit(GPU) Approach for Computationally Efficient Simulation-Optimization[J]. Computers and Chemical Engineering, 2016, 87: 49-62
    [20] Chavez D. Parallelizing Map Projection of Raster Data on Multi-core CPU and GPU Parallel Programming Frameworks[D]. Stockholm: KTH Royal Institute of Technology, 2016
    [21] Gremse F, Hofter A, Razik L, et al. GPU-Acce- lerated Adjoint Algorithmic Differentiation[J]. Computer Physics Communications, 2016, 200: 300-311
    [22] Liu Jinshuo, Zeng Qiumei, Zou Bin, et al. Speed-up Robust Feature Image Registration Algorithm Based on CUDA[J]. Computer Science, 2014, 41(4): 24-27 (刘金硕, 曾秋梅, 邹斌, 等. 快速鲁棒特征算法的CUDA加速优化[J]. 计算机科学, 2014, 41(4): 24-27)

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700