DeLIAP e DeLIAJ: Interfaces de biblioteca de Dependabilidade para Python e Julia
Marcos Irigoyen, Carla Santana, Ramon C.F Ara\'ujo, Samuel, Xavier-de-Souza

TL;DR
This paper introduces Python and Julia interfaces for the fault tolerance library DeLIA, enabling easier implementation of fault-tolerant techniques in HPC applications with minimal overhead.
Contribution
It extends the DeLIA library to Python and Julia via wrappers, facilitating fault-tolerance in these languages and demonstrating its efficiency with a practical application.
Findings
Median overhead of 1.4% in runtime
Wrappers enable fault-tolerance in Python and Julia
Application validation shows practical feasibility
Abstract
The evergrowing computational complexity of High Performance Computing applications is often met with an horizontal scalling of computing systems. Colaterally, each added node risks being a single point of failure to parallel programs, increasing the demand for fault tolerant techniques to be applied, specially at software level. Under such conditions, the fault tolerance library DeLIA was developed in C/C++ with error detection and recovery features. We propose, then, to extend the library's capabilities to Python and Julia through the wrappers DeLIAP and DeLIAJ in order to lower the barrier to entry for implementing fault-tolerant solutions in these languages, which both lack alternatives to the library. To validate the efficiency of the wrappers, an application of the Julia wrapper in the 4D Full waveform inversion method was analyzed, quantitatively assessing the introduced overhead…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
