Revisiting Code Debloating with Ground Truth-based Evaluation
Muhammad Bilal, Moiz Ali, Mohit Kumar, Fareed Zaffar, Fahad Shaon, Ashish Gehani, Sazzadur Rahaman

TL;DR
This paper critically evaluates application-level code debloating techniques using ground-truth-based metrics, revealing significant inaccuracies in existing tools and emphasizing the need for standardized assessment methodologies.
Contribution
It introduces a ground-truth evaluation paradigm for debloating, analyzing eight tools across multiple abstraction levels to uncover their true effectiveness and limitations.
Findings
Dynamic tools often remove excessive code, risking correctness.
Static tools tend to retain too much code due to coarse analysis.
Inaccurate debloating can cause functional issues and security vulnerabilities.
Abstract
Program debloating aims to remove unused code to reduce performance overhead, attack surfaces, and maintenance costs. Over time, debloating has evolved across multiple layers (container, library, and application), each building on the principles of application-level debloating. Despite its central role, application-level debloating continues to rely on imperfect proxies for measuring performance, such as test-case-driven evaluation for correctness, code size for runtime efficiency, and gadget count reduction for estimating security posture. While there is widespread skepticism about using such imperfect proxies, the community still lacks standardized methodologies or benchmarks to assess the true performance of application-level software debloating. This experience paper aims to address the gap. We revisit the foundations of application-level debloating through a ground-truth-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
