FastFI: Enhancing API Call-Site Robustness in Microservice-Based Systems with Fault Injection
Yuzhen Tan, Jian Wang, Shuaiyu Xie, Bing Li, Yunqing Yong, Neng Zhang, Shaolin Tan

TL;DR
FastFI is a framework that significantly improves the efficiency of fault injection in microservice systems by using a specialized solver and dynamic injection, helping identify critical APIs for robustness enhancement.
Contribution
FastFI introduces a DFS-based solver with dynamic fault injection and leverages results to identify critical APIs, addressing limitations of prior lineage-driven approaches.
Findings
Reduces fault-injection time by 76.12% on average
Accurately identifies high-impact APIs for robustness
Maintains acceptable resource overhead
Abstract
Fault injection is a key technique for assessing software reliability, enabling proactive detection of system defects before they manifest in production. However, the increasing complexity of microservice architectures leads to exponential growth in the fault-injection space, rendering traditional random injection inefficient. Recent lineage-driven approaches mitigate this problem through heuristic pruning, but they face two limitations. First, combinatorial-fault discovery remains bottlenecked by general-purpose SAT solvers, which fail to exploit the monotone and low-overlap structure of derived CNF formulas and typically rely on a static upper bound on fault size. Second, existing techniques provide limited post-injection guidance beyond reporting detected faults. To address these challenges, we propose FastFI, a fault-injection-guided framework to enhance the robustness of API call…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Testing and Debugging Techniques · Cloud Computing and Resource Management
