It's not a lie if you don't get caught: simplifying reconfiguration in SMR through dirty logs
Allen Clement, Natacha Crooks, Neil Giridharan, Alex Shamis

TL;DR
The paper introduces Gauss, a modular reconfiguration engine for state-machine replication that enables independent upgrades of components with minimal downtime, improving maintainability and system evolution.
Contribution
Gauss separates the consensus protocol's inner and outer logs, allowing independent reconfiguration and upgrades in SMR systems, which was not possible with prior tightly coupled approaches.
Findings
Enables seamless protocol upgrades with minimal downtime
Supports independent reconfiguration of membership and failure thresholds
Demonstrated effective evolution of SMR stack in Rialo blockchain
Abstract
Production state-machine replication (SMR) implementations are complex, multi-layered architectures comprising data dissemination, ordering, execution, and reconfiguration components. Existing research consensus protocols rarely discuss reconfiguration. Those that do tightly couple membership changes to a specific algorithm. This prevents the independent upgrade of individual building blocks and forces expensive downtime when transitioning to new protocol implementations. Instead, modularity is essential for maintainability and system evolution in production deployments. We present Gauss, a reconfiguration engine designed to treat consensus protocols as interchangeable modules. By introducing a distinction between a consensus protocol's inner log and a sanitized outer log exposed to the RSM node, Gauss allows engineers to upgrade membership, failure thresholds, and the consensus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Software System Performance and Reliability · Cloud Computing and Resource Management
