Performance Evaluation of NAS Parallel and High-Performance Conjugate Gradient Benchmarks in Mahameru
DOI:
https://doi.org/10.15575/join.v10i2.1557Keywords:
Conjugate Gradient Algorithm, High-Performance Computing, MPI vs OpenMP , Supercomputing Performance , Parallel ComputingAbstract
References
[1] G. Xie and Y.-H. Xiao, “How to Benchmark Supercomputers,” in 2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), Guiyang, China: IEEE, Aug. 2015, pp. 364–367. doi: 10.1109/dcabes.2015.98.
[2] E. Strohmaier and H. Shan, “Apex-Map: A Synthetic Scalable Benchmark Probe to Explore Data Access Performance on Highly Parallel Systems,” in Euro-Par 2005 Parallel Processing, J. C. Cunha and P. D. Medeiros, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 114–123.
[3] S. Faulk, J. Gustafson, P. Johnson, A. Porter, W. Tichy, and L. Votta, “Measuring High Performance Computing Productivity,” The International Journal of High Performance Computing Applications, vol. 18, no. 4, pp. 459–473, Nov. 2004, doi: 10.1177/1094342004048539.
[4] M. Hao, W. Zhang, Y. Zhang, M. Snir, and L. T. Yang, “Automatic generation of benchmarks for I/O-intensive parallel applications,” Journal of Parallel and Distributed Computing, vol. 124, pp. 1–13, Feb. 2019, doi: 10.1016/j.jpdc.2018.10.004.
[5] Y. Liu et al., “623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores,” The International Journal of High Performance Computing Applications, vol. 30, no. 1, pp. 39–54, Feb. 2016, doi: 10.1177/1094342015616266.
[6] M. Elshambakey, A. I. Maiyza, M. S. Kashkoush, G. M. Fathy, and H. A. Hassan, “The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach,” J Supercomput, vol. 80, no. 12, pp. 16795–16823, Aug. 2024, doi: 10.1007/s11227-024-06041-9.
[7] S. Varrette, H. Cartiaux, S. Peter, E. Kieffer, T. Valette, and A. Olloh, “Management of an Academic HPC & Research Computing Facility: The ULHPC Experience 2.0,” in 2022 6th High Performance Computing and Cluster Technologies Conference (HPCCT), Fuzhou China: ACM, Jul. 2022. doi: 10.1145/3560442.3560445.
[8] D. Zivanovic et al., “Main Memory in HPC: Do We Need More or Could We Live with Less?,” ACM Trans. Archit. Code Optim., vol. 14, no. 1, pp. 1–26, Mar. 2017, doi: 10.1145/3023362.
[9] M. Sato, Y. Kodama, M. Tsuji, and T. Odajima, “Co-Design and System for the Supercomputer ‘Fugaku,’” IEEE Micro, vol. 42, no. 2, pp. 26–34, Mar. 2022, doi: 10.1109/mm.2021.3136882.
[10] T. Aoyama, I. Kanamori, K. Kanaya, H. Matsufuru, and Y. Namekawa, “Bridge++ 2.0: Benchmark results on supercomputer Fugaku,” 2023, doi: 10.48550/ARXIV.2303.05883.
[11] “Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark,” in Lecture Notes in Computer Science, Cham: Springer International Publishing, 2021, pp. 372–390. doi: 10.1007/978-3-030-78713-4_20.
[12] “Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware,” in Lecture Notes in Computer Science, Cham: Springer International Publishing, 2021, pp. 616–630. doi: 10.1007/978-3-030-85665-6_38.
[13] “Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors,” in Lecture Notes in Computer Science, Cham: Springer International Publishing, 2020, pp. 412–433. doi: 10.1007/978-3-030-50743-5_21.
[14] A. Fuchs, J. Squar, and M. Kuhn, “Ensemble-Based System Benchmarking for HPC,” in 2024 23rd International Symposium on Parallel and Distributed Computing (ISPDC), Chur, Switzerland: IEEE, Jul. 2024, pp. 1–8. doi: 10.1109/ispdc62236.2024.10705405.
[15] “An I/O Analysis of HPC Workloads on CephFS and Lustre,” in Lecture Notes in Computer Science, Cham: Springer International Publishing, 2019, pp. 300–316. doi: 10.1007/978-3-030-34356-9_24.
[16] D. G. Chester, S. A. Wright, and S. A. Jarvis, “Understanding Communication Patterns in HPCG,” Electronic Notes in Theoretical Computer Science, vol. 340, pp. 55–65, Oct. 2018, doi: 10.1016/j.entcs.2018.09.005.
[17] D. H. Bailey et al., “The Nas Parallel Benchmarks,” The International Journal of Supercomputing Applications, vol. 5, no. 3, pp. 63–73, Sep. 1991, doi: 10.1177/109434209100500306.
[18] J. Dongarra, M. A. Heroux, and P. Luszczek, “High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems,” The International Journal of High Performance Computing Applications, vol. 30, no. 1, pp. 3–10, Feb. 2016, doi: 10.1177/1094342015593158.
[19] “SLURM: Simple Linux Utility for Resource Management,” in Lecture Notes in Computer Science, Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 44–60. doi: 10.1007/10968987_3.
[20] L. V. Kalé et al., “NAS Parallel Benchmarks,” in Encyclopedia of Parallel Computing, D. Padua, Ed., Boston, MA: Springer US, 2011, pp. 1254–1259. doi: 10.1007/978-0-387-09766-4_133.
[21] D. A. Mallon, G. L. Taboada, J. Tourino, and R. Doallo, “NPB-MPJ: NAS Parallel Benchmarks Implementation for Message-Passing in Java,” in 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, Weimar: IEEE, Feb. 2009, pp. 181–190. doi: 10.1109/PDP.2009.59.
[22] C. A. Navarro, N. Hitschfeld-Kahler, and L. Mateu, “A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures,” Commun. comput. phys., vol. 15, no. 2, pp. 285–329, Feb. 2014, doi: 10.4208/cicp.110113.010813a.
[23] J.-C. Régin, M. Rezgui, and A. Malapert, “Embarrassingly Parallel Search,” in Principles and Practice of Constraint Programming, vol. 8124, C. Schulte, Ed., in Lecture Notes in Computer Science, vol. 8124. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 596–610. doi: 10.1007/978-3-642-40627-0_45.
[24] J. Löff et al., “The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures,” Future Generation Computer Systems, vol. 125, pp. 743–757, Dec. 2021, doi: 10.1016/j.future.2021.07.021.
[25] “A CUDA Implementation of the High Performance Conjugate Gradient Benchmark,” in Lecture Notes in Computer Science, Cham: Springer International Publishing, 2015, pp. 68–84. doi: 10.1007/978-3-319-17248-4_4.
[26] H. Lu, S. Dwarkadas, A. L. Cox, and W. Zwaenepoel, “Quantifying the Performance Differences between PVM and TreadMarks,” Journal of Parallel and Distributed Computing, vol. 43, no. 2, pp. 65–78, Jun. 1997, doi: 10.1006/jpdc.1997.1332.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2025 Taufiq Wirahman, Arnida L Latifah, Furqon Hensan Muttaqien, I Wayan Aditya Swardiana, Andria Arisal, Syam Budi Iryanto, Rifki Sadikin

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License







