Towards an Oracle for Binary Decomposition Under Compilation Variance
Ang Jia, He Jiang, Zhilei Ren, Xiaochen Li, Zhipeng Yang, Yaxin Duan, Ming Fan, Ting Liu

TL;DR
This paper introduces an empirical framework and oracle for evaluating binary decomposition methods under compilation variance, revealing limitations of existing techniques and emphasizing the need for compilation-aware solutions in third-party library detection.
Contribution
The paper develops a systematic empirical evaluation framework and oracle for binary decomposition under compilation variance, providing a rigorous basis for assessing and improving TPL detection methods.
Findings
Existing methods suffer from under- or over-aggregation failures.
Current decomposition techniques are inadequate for robust TPL detection.
Compilation variance significantly impacts binary decomposition accuracy.
Abstract
Third-Party Library (TPL) detection, which identifies reused libraries in binary code, is critical for software security analysis. At its core, TPL detection depends on binary decomposition-the process of partitioning a monolithic binary into cohesive modules. Existing decomposition methods, whether anchor-based or clustering-based, fundamentally rely on the assumption that reused code exhibits similar function call relationships. However, this assumption is severely undermined by Function Call Graph (FCG) variations introduced by diverse compilation settings, particularly function inlining decisions that drastically alter FCG structures. In this work, we conduct the first systematic empirical study to establish the oracle for optimal binary decomposition under compilation variance. We first develop a labeling method to create precise FCG mappings on a comprehensive dataset compiled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Computability, Logic, AI Algorithms · Diverse Scientific and Economic Studies
