Incremental Checkpointing with Application to Distributed Discrete Event Simulation
Thomas Huining Feng and Edward A. Lee
Winter Simulation Conference (WSC 2006), Monterey, CA, December 3-6, 2006
ABSTRACT
Checkpointing is widely used in robust fault-tolerant applications.
We present an efficient incremental checkpointing
mechanism. It requires to record only the state changes
and not the complete state. After the creation of a checkpoint,
state changes are logged incrementally as records
in memory, with which an application can spontaneously
roll back later. This incrementalism allows us to implement
checkpointing with high performance. Only small
constant time is required for checkpoint creation and state
recording. Rollback requires linear time in the number
of recorded state changes, which is bounded by the number
of state variables times the number of checkpoints.
We implement a Java source transformer that automatically
converts an existing application into a behavior-preserving
one with checkpointing functionality. This transformation
is application-independent and application-transparent. A
wide range of applications can benefit from this technique.
Currently, it has been used for distributed discrete event
simulation using the Time Warp technique.