Finding and Analyzing Crash-Consistency Bugs in Persistent-Memory File Systems
Hayley LeBlanc, Shankara Pailoor, Isil Dillig, James Bornholt, Vijay, Chidambaram

TL;DR
This paper introduces FlyTrap, a framework that uncovers crash-consistency bugs in persistent-memory file systems, revealing critical logic errors and design flaws that impact system reliability and correctness.
Contribution
The paper presents FlyTrap, a novel testing framework that discovers new crash-consistency bugs in PM file systems, providing insights into common bug sources and guiding future system design and testing.
Findings
FlyTrap found 18 new bugs in four PM file systems.
Many bugs stem from logic errors rather than flush or fence misuse.
In-place metadata updates and recovery code are major bug sources.
Abstract
We present a study of crash-consistency bugs in persistent-memory (PM) file systems and analyze their implications for file-system design and testing crash consistency. We develop FlyTrap, a framework to test PM file systems for crash-consistency bugs. FlyTrap discovered 18 new bugs across four PM file systems; the bugs have been confirmed by developers and many have been already fixed. The discovered bugs have serious consequences such as breaking the atomicity of rename or making the file system unmountable. We present a detailed study of the bugs we found and discuss some important lessons from these observations. For instance, one of our findings is that many of the bugs are due to logic errors, rather than errors in using flushes or fences; this has important applications for future work on testing PM file systems. Another key finding is that many bugs arise from attempts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Distributed systems and fault tolerance
