A Domain Specific Language for Testing Consensus Implementations
Cezara Dragoi, Constantin Enea, Srinidhi Nagendra, Mandayam Srivas

TL;DR
This paper introduces Netrix, a domain-specific language and tool for testing distributed consensus implementations, improving coverage, bug reproduction, and regression testing, demonstrated on blockchain and consensus protocols.
Contribution
The paper presents a novel testing methodology and tool, Netrix, tailored for consensus algorithms, enabling better coverage, bug detection, and regression testing in distributed systems.
Findings
Identified 4 deviations in Tendermint from its protocol specification.
Reproduced 4 known bugs in Raft.
Validated the effectiveness of Netrix on multiple consensus protocols.
Abstract
Large-scale, fault-tolerant, distributed systems are the backbone for many critical software services. Since they must execute correctly in a possibly adversarial environment with arbitrary communication delays and failures, the underlying algorithms are intricate. In particular, achieving consistency and data retention relies on intricate consensus (state machine replication) protocols. Ensuring the reliability of implementations of such protocols remains a significant challenge because of the enormous number of exceptional conditions that may arise in production. We propose a methodology and a tool called Netrix for testing such implementations that aims to exploit programmer's knowledge to improve coverage, enables robust bug reproduction, and can be used in regression testing across different versions of an implementation. As evaluation, we apply our tool to a popular proof of stake…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Cloud Computing and Resource Management
