Adapting the DMTCP Plugin Model for Checkpointing of Hardware Emulation
Rohan Garg, Kapil Arya, Jiajun Cao, Gene Cooperman, Jeff, Evans, Ankit Garg, Neil A. Rosenberg, K. Suresh

TL;DR
This paper presents an extension to DMTCP's plugin model enabling checkpointing and re-connection of hardware emulators, facilitating flexible, collaborative circuit testing, fault injection, and proprietary plugin integration.
Contribution
It introduces a process-level virtualization plugin model for DMTCP that supports hardware emulator disconnection and reconnection, enhancing circuit verification workflows.
Findings
Enables seamless checkpointing and restart with different hardware emulators.
Supports proprietary plugins for fault injection and testing.
Reduces initialization time during circuit simulation.
Abstract
Checkpoint-restart is now a mature technology. It allows a user to save and later restore the state of a running process. The new plugin model for the upcoming version 3.0 of DMTCP (Distributed MultiThreaded Checkpointing) is described here. This plugin model allows a target application to disconnect from the hardware emulator at checkpoint time and then re-connect to a possibly different hardware emulator at the time of restart. The DMTCP plugin model is important in allowing three distinct parties to seamlessly inter-operate. The three parties are: the EDA designer, who is concerned with formal verification of a circuit design; the DMTCP developers, who are concerned with providing transparent checkpointing during the circuit emulation; and the hardware emulator vendor, who provides a plugin library that responds to checkpoint, restart, and other events. The new plugin model is an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Security and Verification in Computing · Real-Time Systems Scheduling
