What is checkpointing? (2024)

What is Checkpointing?

Checkpointingis the process of periodically saving (or writing) the execution state of an application such that in the event of an interruption in the execution of an application, this saved state can be used to continue the execution at a later time. Typically, the execution state is written to a file.Resuming the execution of an application using a previously saved state or checkpoint (instead of starting it from scratch) is referred to as theRestartphase.

What are theadvantagesofcheckpointing?

Checkpointing not only saves time by offering the capability to resume the execution of an application in case of a hardware failure in the underlying computing platform (e.g., network interconnect failure) or if the computing platform becomes unavailable due to emergency maintenance, but it also helps in overcoming the time-limits associated with the different job queues/partitions.

What are the different types of checkpointing?

The different types of checkpointing include system-level checkpointing, application-level checkpointing, and user-level or library-level checkpointing.

System-Level checkpointing involves taking core-dumps of the computational state of the machine or system on which the application is running.

Pros: It is convenient to use, no code changes needed, user only specifies the checkpointing frequency.

Cons: It involves large memory-footprint of checkpoints as the entire execution state of the application and the operating system processes are saved during checkpointing, and system administrator level privileges are needed for installation of additional code.

Example: Berkeley Lab Checkpointing and Restart (BLCR)

Library-Level or User-Level Checkpointing involves the use of libraries for taking checkpoints while being agnostic to kernel-level information such as process IDs.

Pros: It is useful for checkpointing applications without requiring any changes to the source-code or the operating system kernel.

Cons: The users may need to load the checkpointing library before starting their applications, and then, would need to dynamically link the loaded library to their applications. The checkpoints can have a large memory-footprint.

Example: DMTCP

Application-Level Checkpointing involves implementing the checkpoint-and-restart mechanism within the application itself. An efficient implementation of application-level checkpointing would require saving and reading the state of only those variables or data that are necessary for recreating the state of the entire application. Such variables or data are referred to as critical variables/data. As an example, consider the C code below (definition of myFct function is not included below).

int main(){

int x = 4;

Recommended by LinkedIn

💡GovCon Insights by G2Xchange | 4-24-24 G2Xchange 4 months ago
Mobile Device Data Storage Concepts Rich P. 1 year ago
NuNet Technical Update Q2 2024 NuNet 1 month ago

int y = sqrt(x);

int z, i; int j = x*y;

for (i =0; i< 100; i++){

z += j* myFct(randomNumber * i);

}

return 0;

}

In this code, "i" and "z" are critical variables as their values are updated and cannot be derived easily to recreate the execution state of the code once it is interrupted.

Pros: Application-level checkpointing does not rely on the availability of any external libraries or tools, and hence, is useful for writing portable applications.

Cons: While an efficient implementation of this technique will generate checkpoints with smaller memory footprint and incur lesser I/O overheads as compared to other types of checkpointing, the onus is on the user (or the developer) to manually implement it on a per application basis, and therefore, the users should understand the code of the applications that they are checkpointing to manually reengineer the code for inserting checkpoint-restart logic.

In case of distributed (message passingor MPI applications), a checkpoint can be written as a "central checkpoint" involving a single process (typically, the root or master or manager process in the MPI world) or a distributed checkpoint (involving multiple processes and an appropriate parallel I/O API calls and strategy).

What are the side-effectsof checkpointing?

Writing and reading the application states or checkpointsintroduces additional I/O overheads. Depending upon the frequency of checkpointing and the size of the checkpoint files, the IO overheads can add noticeable increase in the run-time and storage needs of an application.

Do you have sample code?

Here is the link to the GitHub repository containing sample code in C++ that has checkpointing and restart feature embedded in it: bsswfellowship/checkpointing at main · ritua2/bsswfellowship (github.com)

References

  1. Arora, R., Bangalore, P. & Mernik, M. A technique for non-invasive application-level checkpointing.J Supercomput57, 227–255 (2011). https://doi.org/10.1007/s11227-010-0383-5
  2. Ritu Arora, Trung Nguyen, "ITALC: Interactive Tool for Application-Level Checkpointing", HUST17 workshop at SC17, November 2017.

What is checkpointing? (2024)
Top Articles
Amaterasu
What is the Difference Between RAM and ROM?
DPhil Research - List of thesis titles
9192464227
Us 25 Yard Sale Map
House Share: What we learned living with strangers
Student Rating Of Teaching Umn
Sitcoms Online Message Board
What Was D-Day Weegy
Persona 4 Golden Taotie Fusion Calculator
Washington Poe en Tilly Bradshaw 1 - Brandoffer, M.W. Craven | 9789024594917 | Boeken | bol
7440 Dean Martin Dr Suite 204 Directions
Munich residents spend the most online for food
Melendez Imports Menu
Troy Gamefarm Prices
Tracking every 2024 Trade Deadline deal
Mobile crane from the Netherlands, used mobile crane for sale from the Netherlands
Calvin Coolidge: Life in Brief | Miller Center
Sam's Club Near Wisconsin Dells
Have you seen this child? Caroline Victoria Teague
Ripsi Terzian Instagram
Wega Kit Filtros Fiat Cronos Argo 1.8 E-torq + Aceite 5w30 5l
Nextdoor Myvidster
Six Flags Employee Pay Stubs
Rocksteady Steakhouse Menu
Cars And Trucks Facebook
Joplin Pets Craigslist
Greencastle Railcam
Pickle Juiced 1234
Frostbite Blaster
Craigslist Car For Sale By Owner
AI-Powered Free Online Flashcards for Studying | Kahoot!
Culver's of Whitewater, WI - W Main St
Section 212 at MetLife Stadium
Join MileSplit to get access to the latest news, films, and events!
2 Pm Cdt
Jasgotgass2
Isabella Duan Ahn Stanford
11 Best Hotels in Cologne (Köln), Germany in 2024 - My Germany Vacation
Weather In Allentown-Bethlehem-Easton Metropolitan Area 10 Days
Kutty Movie Net
Matt Brickman Wikipedia
UWPD investigating sharing of 'sensitive' photos, video of Wisconsin volleyball team
Huntsville Body Rubs
Every Type of Sentinel in the Marvel Universe
Bbwcumdreams
Sam's Club Fountain Valley Gas Prices
Download Twitter Video (X), Photo, GIF - Twitter Downloader
When Is The First Cold Front In Florida 2022
La Fitness Oxford Valley Class Schedule
Latest Posts
Article information

Author: Amb. Frankie Simonis

Last Updated:

Views: 5644

Rating: 4.6 / 5 (56 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Amb. Frankie Simonis

Birthday: 1998-02-19

Address: 64841 Delmar Isle, North Wiley, OR 74073

Phone: +17844167847676

Job: Forward IT Agent

Hobby: LARPing, Kitesurfing, Sewing, Digital arts, Sand art, Gardening, Dance

Introduction: My name is Amb. Frankie Simonis, I am a hilarious, enchanting, energetic, cooperative, innocent, cute, joyous person who loves writing and wants to share my knowledge and understanding with you.