Time-stamp Incremental Checkpointing and Its Application for an Optimization of Execution Model to Improve Performance of CAPE
DOI:
https://doi.org/10.31449/inf.v42i3.2244Abstract
CAPE, which stands for Checkpointing-Aided Parallel Execution,is a checkpoint-based approach to automatically translate and execute OpenMP programs on distributed-memory architectures. This approach demonstrates high-performance and complete compatibility with OpenMP on distributed-memory systems. In CAPE, checkpointing is one of the main factors acted on the performance of the system. This is shown over two versions of CAPE. The first version based on complete checkpoints is too slow as compared to the second version based on Discontinuous Incremental Checkpointing. This paper presents an improvement of Discontinuous Incremental Checkpointing, and a new execution model for CAPE using new techniques of checkpointing. It contributes to improve the performance and make CAPE even more flexible.References
Message Passing Interface Forum (2014) MPI: A Message-Passing Interface Standard, http://mpi-forum.org/docs/mpi-3.1/mpi31-
report.pdf.
OpenMP ARB (2013) OpenMP application program interface version 4.0,
http://www.openmp.org.
Morin, Christine and Lottiaux, Renaud and Vallée, Geoffroy and Gallard, Pascal and Utard, Gael and Badrinath, Ramamurthy and Rilling, Louis (2003) Kerrighed: a single system image cluster operating system for high performance computing, Euro-Par 2003 Parallel Processing, Springer, pp. 1291–1294.
Sato, Mitsuhisa and Harada, Hiroshi and Hasegawa, Atsushi and Ishikawa, Yutaka (2001) Cluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system, Scientific Programming, Hindawi, pp. 123–130.
Karlsson, Sven and Lee, Sung-Woo and Brorsson, Mats (2002) A fully compliant OpenMP implementation on software distributed shared memory, High Performance ComputingHiPC 2002, Springer, Berlin, pp. 195–206.
Basumallik, Ayon and Eigenmann, Rudolf (2005) Towards automatic translation of OpenMP to MPI, Proceedings of the 19th annual international conference on Supercomputing (SC), ACM, pp. 189–198.
Dorta, Antonio J and Badıa, Jose M and Quintana, Enrique S and de Sande, Francisco (2005) Implementing OpenMP for clusters on top of MPI , Recent Advances in Parallel Virtual Machine and Message Passing Interface, Springer, pp. 148–155.
Huang, Lei and Chapman, Barbara and Liu, Zhenying (2005) Towards a more efficient implementation of OpenMP for clusters via translation to global arrays, Parallel Computing, Elsevier, pp. 1114–1139.
Hoeflinger, Jay P (2006) Extending OpenMP to clusters, White Paper, Intel Corporation.
Renault, Eric (2007) Distributed Implementation of OpenMP Based on Checkpointing Aided Parallel Execution, A Practical Programming Model for the Multi-Core Era, Springer, pp. 195–206.
Plank, James S and Beck, Micah and Kingsley, Gerry and Li, Kai (1994) Libckpt: Transparent checkpointing under unix, White Paper, Computer Science Department.
Ha, Viet Hai and Renault, Eric (2011) Discontinuous Incremental: A new approach towards extremely lightweight checkpoints, Computer Networks and Distributed Systems (CNDS), IEEE, pp. 227–232.
Ha, Viet Hai and Renault, Eric (2011) Design and performance analysis of CAPE based on discontinuous incremental checkpoints, IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, IEEE, pp. 862-867.
Tran, Van Long and Renault, Eric and Ha, Viet Hai (2016) Analysis and evaluation of the performance of CAPE, IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, IEEE, pp. 620–627.
Ha, Viet Hai and Renault, Eric (2011) Improving performance of CAPE using discontinuous incremental checkpointing, High Performance Computing and Communications (HPCC), IEEE, pp. 802–807.
Tran, Van Long and Renault, Eric and Do, Xuan Huyen and Ha, Viet Hai (2017) Design and implementation of a new execution model for CAPE, Proceedings of the Eighth International Symposium on Information and Communication Technology (SoICT’s 2017), ACM, pp. 453–459.
Bernstein (1966) Program Analysis for Parallel Processing, IEEE Transaction on Electronic Computers, IEEE, pp. 757–762.
Cores, Ivan and Rodrıguez, Monica and Gonzalez, Patricia and Martın, Marıa J (2016) Reducing the overhead of an MPI application-level migration approach, Parallel Computing, Elsevier, pp. 72–82.
Li, C-CJ and Fuchs, W Kent (1990) Catch compiler-assisted techniques for checkpointing, Fault-Tolerant Computing (FTCS), IEEE, pp. 74–81.
Chen, Zhengyu and Sun, Jianhua and Chen, Hao (2016) Optimizing Checkpoint Restart with Data Deduplication, Scientific Programming, Hindawi, doi:10.1155/2016/9315493.
Plank, James S and Xu, Jian and Netzer, Robert HB (1995) Compressed differences: An algorithm for fast incremental checkpointing, Technical Report CS-95-302, University of Tennessee.
Hyochang, NAM and Jong, KIM and Hong, Sung Je and Sunggu, LEE (2002) Probabilistic checkpointing, IEICE TRANSACTIONS on Information and Systems, The Institute of Electronics, Information and Communication Engineers, pp. 1093–1104.
Mehnert-Spahn, John and Feller, Eugen and Schoettner, Michael (2009) Incremental checkpointing for grids, Linux Symposium, Montreal, Quebec, Canada, pp. 201–220.
Cores, Ivan and Rodrıguez, Gabriel and Gonzalez, Patricia and Osorio, Roberto R (2013) Improving scalability of application-level checkpoint-recovery by reducing checkpoint sizes, New Generation Computing, Springer, pp. 163–185.
Alfred, V.Aho and Monica, S. Lam and Ravi, Sethi and Jeffrey, D. Ullman (2006) Compilers Principles, Techniques,& Tools, Addion Wesley.
Thakur, Rajeev and Rabenseifner, Rolf and Gropp, William (2005) Optimization of collective communication operations in MPICH, International Journal of High Performance Computing Applications, Sage Publications, pp. 49–66.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika