Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets
Shigeo Mitsunari, Takashi Hoshino

TL;DR
This paper introduces an optimized method for 32-bit unsigned division by constants on 64-bit CPUs, significantly improving performance in compiler-generated code.
Contribution
It presents a new optimization technique tailored for 32-bit division by constants on 64-bit architectures, enhancing existing compiler implementations.
Findings
Achieved 1.67x speedup on Intel Xeon processors.
Achieved 1.98x speedup on Apple M4 processors.
The LLVM patch implementing this method has been merged into llvm:main.
Abstract
Granlund and Montgomery proposed an optimization method for unsigned integer division by constants [3]. Their method (called the GM method in this paper) was further improved in part by works such as [1] and [7], and is now adopted by major compilers including GCC, Clang, Microsoft Compiler, and Apple Clang. However, for example, for x/7, the generated code is designed for 32-bit CPUs and therefore does not fully exploit 64-bit capabilities. This paper proposes an optimization method for 32-bit unsigned division by constants targeting 64-bit CPUs. We implemented patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X (Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the microbenchmark described later. The LLVM patch has already been merged into llvm:main [6], demonstrating the practical applicability of the proposed method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
