• A survey of checkpointing algorithms for parallel and distributed computers

    • Fulltext

       

        Click here to view fulltext PDF


      Permanent link:
      https://www.ias.ac.in/article/fulltext/sadh/025/05/0489-0510

    • Keywords

       

      Checkpointing algorithms; parallel & distributed computing; shared memory systems; rollback recovery; fault-tolerant systems

    • Abstract

       

      Checkpoint is defined as a designated place in a program at which normal processing is interrupted specifically to preserve the status information necessary to allow resumption of processing at a later time.Checkpointing is the process of saving the status information. This paper surveys the algorithms which have been reported in the literature for checkpointing parallel/distributed systems. It has been observed that most of the algorithms published for checkpointing in message passing systems are based on the seminal article by Chandy and Lamport. A large number of articles have been published in this area by relaxing the assumptions made in this paper and by extending it to minimise the overheads of coordination and context saving. Checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory in stable storage. Most of the algorithms assume that there is no knowledge about the programs being executed. It is however felt that in development of parallel programs the user has to do a fair amount of work in distributing tasks and this information can be effectively used to simplify checkpointing and rollback recovery.

    • Author Affiliations

       

      S Kalaiselvi1 V Rajaraman2

      1. Supercomputer Education and Research Centre (SERC), Indian Institute of Science, Bangalore - 560012, India
      2. Jawaharlal Nehru Centre for Advanced Scientific Research, Indian Institute of Science Campus, Bangalore - 560012, India
    • Dates

       
  • Sadhana | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2017-2019 Indian Academy of Sciences, Bengaluru.