期刊文献+

迭代方法中基于渐近规模的通信与计算比分析 预览 被引量:3

Analysis of Communication-to-Computation Based on Asymptotic Size for Iterative Methods
在线阅读 下载PDF
收藏 分享 导出
摘要 迭代方法是科学计算中求解大规模稀疏线性代数方程组最常用的方法.大量实际应用表明,迭代方法通常具有较高的通信与计算比,只有在粗粒度并行下才能取得较好的并行可扩展性能.而实际应用大规模计算的需求和当前多核/众核体系结构的发展趋势要求迭代方法具备细粒度并行可扩展能力.文中引入渐近规模,即满足加速条件的计算规模下界,来反映并行迭代方法适应细粒度并行的能力,并由此刻画通信与计算比.基于矩阵的稀疏模式及其通信模式、机器的通信参数和迭代方法的基本运算,给出了渐近规模的理论预测公式.在一台包含128个双路4核计算节点的并行机上,分别基于纯进程并行(MPI)和进程/线程混合并行(MPI/OpenMP),以实际应用中3种常用迭代方法Jacobi、CG、BiCGSTAB为例,分析其渐近规模.并行可扩展性测试表明了渐近规模用于刻画迭代方法通信与计算比的准确性.对于纯进程情形,给出了渐近规模的理论预测与实际测试的对比,表明了理论预测结果的正确性.最后,基于这些结果,从迭代方法的算法设计和并行实现等方面讨论了面向未来更大规模的计算系统,降低通信与计算比的途径. Iterative method are one of the most efficient numerical algorithms to solve large-scale sparse linear systems arising from scientific computing. The parallel scalability of iterative method can be measured by the communication-to-computation (CtC) during the iterative process. The CtC is high for many iterative methods, as a result, coarse-grain parallelism is needed in order to obtain expected scalability. However, fine-grain parallelism is required for the complicacy of architecture increasing with multi/many-cores. In this paper, we introduce the concept of asymptotic size, which defined as the low bound of the problem size such that satisfying the speedup condition that parallel speedup is more than 1. We hope the asymptotic size can be used to describe the CtC and the ability of fine-grain parallelism for iterative method. Moreover, the theoretical prediction formula of asymptotic size is obtained based on the following parameters, the sparse and communication pattern of matrix, communication parameters of machine, and combination of basic operations of iterative methods. Using asymptotic size, the CtC is analyzed for three popular iterative methods, including Jacobi, CG, BiCGSTAB, on a MPP machine with 128 Double Quad-core nodes. The performance results are given for both MPI-Only and MPI/ OpenMP programming model, which show the usefulness of the asymptotic size for describing the CtC of iterative methods. For MPI-Only case, we also give the comparison of the prediction results and experiment results, which show the validation of the formula of asymptotic size. Finally, future research topics for improving the scalability of the iterative methods on more power supercomputers also discussed based on the analysis conclusions.
作者 徐小文 莫则尧 武林平 XU Xiao-Wen MO Ze-Yao WU Lin-Ping (Institute of Applied Physics and Computational Mathematics, Beijing 100094)
出处 《计算机学报》 EI CSCD 北大核心 2013年第4期782-789,共8页 Chinese Journal of Computers
基金 国家“九七三”重点基础研究发展规划项目基金(2011CB309702) 国家“八六三”高技术研究发展计划项目基金(2012AA01A309)资助~~
关键词 迭代方法 通信与计算比 并行可扩展 渐近规模 多核 众核体系结构 iterative method communication-to-computation parallel scalability asymptoticsize multi/many-cores
作者简介 徐小文,男,1978年生,博士,副研究员,中国计算机学会(CCF)会员,主要研究方向为并行计算.E—mail:xwxu@iapcm.ac.cn. 奠则尧,男,1971年生,博士,研究员,主要研究领域为并行计算. 武林平,男,1977年生,博士,副研究员,主要研究方向为大规模计算系统的高效管理与容错机制.
  • 相关文献

参考文献10

  • 1Saad Y. Iterative Methods for Sparse Linear Systems. 2ndEdition. Philadelphia:SIAM Publisher, 2003. 被引量:1
  • 2Ang J, Evans K, Geist A, Heroux M, Hovland P, MarquesO,Curfman L,Ng E,Wild S. Extreme-scale solvers:Transi-tion to future architectures. DOE ASCR Report 2012-3,2012. 被引量:1
  • 3Gahvari H, Baker H,Schuiz M, Yang U, Jordan K,GroppW. Modeling the performance of an algebraic multigrid cycleon HPC platforms//Proceedings of the 25th InternationalConference on Super Computing (ICS 2011 ). Tucson, AZ,2011:172-181. 被引量:1
  • 4Ghysels P, Ashby T,Meerbergen K, Vanroose W. Hidingglobal communication latency in the GMRES algorithm onmassively parallel machines. Intel Exascience Lab, Leuven,Belgium:Technical Report 04. 2012. 1,2012. 被引量:1
  • 5李晓梅等著..可扩展并行算法的设计与分析[M].北京:国防工业出版社,2000:189.
  • 6陈国良著..并行计算 结构·算法·编程 第3版[M].北京:高等教育出版社,2011:573.
  • 7莫则尧.实用的并行程序性能分析方法[J].数值计算与计算机应用,2000,21(4):266-275. 被引量:5
  • 8徐小文,莫则尧.并行代数多重网格算法可扩展性能分析[J].计算物理,2007,24(4):387-394. 被引量:6
  • 9Mo Ze-Yao, Zhang Ai-Qing, Cao Xiao-Lin, Liu Qing-Kai,Xu Xiao-Wen, An Heng-Bin, Pei Wen-Bin,Zhu Shao-Ping.JASMIN:A parallel software infrastructure for scientificcomputing. Frontiers of Computer Science in China,2010,4(4):480-488. 被引量:1
  • 10Ballard G, Demmel J, Holtz 0? Schwartz O. Minimizingcommunication in numerical linear algebra. SIAM Journal onMatrix Analysis and Applications, 2011, 32 (3):866-901. 被引量:1

二级参考文献28

  • 1徐小文 ,莫则尧 .一种新的并行代数多重网格粗化算法[J].计算数学,2005,27(3):325-336. 被引量:5
  • 2莫则尧,李晓梅.工作站网络环境下的并行计算[J].计算机学报,1997,20(6):510-517. 被引量:32
  • 3Brandt A.Multi-level adaptive solution to boundary-value problems[J].Math Comp,1977,31(138):333-390. 被引量:1
  • 4Brandt A,McCormick S F,Ruge J.Algebraic multigrid(AMG) for automatic multigrid solution with application to geodetic computations[R].Institute for Computational Studies,Colorado,1982. 被引量:1
  • 5Brandt A.Algebraic multigrid theory:the symmetric case[J].App Math Comp,1986,19:23-56. 被引量:1
  • 6Adams M F,Bayraktar H H,Keaveny T M,Papadopoulos P.Applications of algebraic multigrid to large-scale finite element analysis of whole bone micro-mechanics on the IBM SP[A].International Super Computing Conference,2003. 被引量:1
  • 7Adams M F,Bayraktar H H,Keaveny T M,Papadopoulos P.Ultrascalble implicit finite element analyses in sold mechanics with over a half a billion degrees of freedom[A].International Super Computing Conference,2004. 被引量:1
  • 8Cleary A J,Falgout R D,Henson V E,Jones J E,Manteuffel T A,McCormick S F,Miranda G N,Ruge J W.Robustness and scalability of algebraic multigrid[J].SIAM J on Sci Comput,2000,21(5):1886-1908. 被引量:1
  • 9Hypre:High-performance Preconditioning Library[EB/OL].http://acts.nersc.gov/hypre/main.html. 被引量:1
  • 10Krechel A,Stuben K.Parallel algebraic multigrid based on subdomain blocking[R].GMD Report 71,Nov,1999. 被引量:1

共引文献9

同被引文献39

  • 1徐小文,莫则尧.并行代数多重网格算法可扩展性能分析[J].计算物理,2007,24(4):387-394. 被引量:6
  • 2武林平,魏勇,刘旭.多核集群中系统嗓音的测最[C]//2012全国高性能计算学术年会.北京:中国计算机学会,2012:1-5. 被引量:1
  • 3Gioiosa R, Petrini F, Davis K, et al. Analysis of system overhead on parallel computers [C] //Proe of the 4th IEEE Int Symp on Signal Processing and Information Technology. Piscataway, NJ: IEEE, 2004:387-390. 被引量:1
  • 4Beckman P, lskra K, Yoshii K, et al. Benchmarking the effects of operating system interference on extreme-scale parallel machines [J]. Cluster Computing, 2008, 11 (1) : 3- 16. 被引量:1
  • 5Herowx M A. HPCCG; A simple conjugate gradient benchmark code for a 3D chimney domain on an arbitrary numher of processors [CP/OL]. [2014-03-13]. http://www. mantevo, org/downloads/HPCCG-1.0, html. 被引量:1
  • 6Hoefler T, Schneider T, i.urusdaine A. Characterizing the influence of system noise on large scale applications hy simulation [C] //Proc of the 2010 ACM/IEEE Int Conf for High Performance Computing, Networking, Storage and Analysis. Piseataway, NJ: IEEE, 2010:1-11. 被引量:1
  • 7Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, B: Instruction Set Reference, N-Z [M]. Santa Clara, California: Intel Corporation, 2010:251-252. 被引量:1
  • 8Dhabaleswar K. Osu micro-benchmarks [CP/OL]. [2014-03- 13]. http://mvapieh, cse. ohio-state, edu/benchmarks/. 被引量:1
  • 9Petrini F, Kerbyson D K, Pakin S. The case of the missing supercomputer performance: Achieving opllmal performance on the 8192 processors of ASCI Q [C] //Proc of the 2003 ACM/IEEE Con{ on Supereamputing. Piseataway, N J: IEEE, 2003, 55-55. 被引量:1
  • 10Ferreira K B, Bridges P, Brightwell R. Characterizing application sensitivity to OS interference using kernel level noise injection [C] //Proc of the 2008 ACM/IEEE Conf on Supereomputing. Piscataway, NJ: IEEE, 2008: 19-29. 被引量:1

引证文献3

二级引证文献3

投稿分析

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部 意见反馈