
TL;DR
This paper introduces a machine learning-based algorithm for basic block reordering that considers cache effects, outperforming existing methods in optimizing binary performance across diverse workloads.
Contribution
A novel reordering algorithm that models cache effects and uses machine learning to optimize binary performance, surpassing traditional fall-through maximization techniques.
Findings
Outperforms existing reordering methods on various benchmarks.
Improves application performance with large code size.
Validated on real-world workloads including Facebook and SPEC benchmarks.
Abstract
Basic block reordering is an important step for profile-guided binary optimization. The state-of-the-art goal for basic block reordering is to maximize the number of fall-through branches. However, we demonstrate that such orderings may impose suboptimal performance on instruction and I-TLB caches. We propose a new algorithm that relies on a model combining the effects of fall-through and caching behavior. As details of modern processor caching is quite complex and often unknown, we show how to use machine learning in selecting parameters that best trade off different caching effects to maximize binary performance. An extensive evaluation on a variety of applications, including Facebook production workloads, the open-source compilers Clang and GCC, and SPEC CPU benchmarks, indicate that the new method outperforms existing block reordering techniques, improving the resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
