Elzar: Triple Modular Redundancy using Intel Advanced Vector Extensions (technical report)
Dmitrii Kuvaiskii, Oleksii Oleksenko, Pramod Bhatotia, Pascal Felber,, Christof Fetzer

TL;DR
Elzar is a compiler framework that leverages Intel AVX vectorization to implement triple modular redundancy for fault tolerance, aiming to reduce overheads of traditional instruction-level redundancy but with mixed effectiveness.
Contribution
This work introduces Elzar, a novel compiler-based approach utilizing SIMD AVX extensions to implement triple modular redundancy for fault tolerance in multithreaded applications.
Findings
SIMD can reduce redundancy overheads for certain workloads
Overheads vary significantly across different applications
Potential improvements in AVX could enhance performance
Abstract
Instruction-Level Redundancy (ILR) is a well-known approach to tolerate transient CPU faults. It replicates instructions in a program and inserts periodic checks to detect and correct CPU faults using majority voting, which essentially requires three copies of each instruction and leads to high performance overheads. As SIMD technology can operate simultaneously on several copies of the data, it appears to be a good candidate for decreasing these overheads. To verify this hypothesis, we propose Elzar, a compiler framework that transforms unmodified multithreaded applications to support triple modular redundancy using Intel AVX extensions for vectorization. Our experience with several benchmark suites and real-world case-studies yields mixed results: while SIMD may be beneficial for some workloads, e.g., CPU-intensive ones with many floating-point operations, it exhibits higher overhead…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
