TL;DR
This paper demonstrates that a SIMD vectorized approach using AVX2 instructions can significantly outperform dedicated popcnt instructions for counting bits in binary streams, benefiting various applications and being adopted by LLVM.
Contribution
The paper introduces a novel SIMD-based method for population counting that surpasses dedicated instructions in speed and has been integrated into LLVM's clang compiler.
Findings
Vectorized approach is twice as fast as dedicated popcnt instructions.
Performance gains are even greater for similarity measures requiring Boolean operations.
Method has been adopted by LLVM's clang compiler.
Abstract
Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g., popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g., the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (clang).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
