ARM 4-BIT PQ: SIMD-based Acceleration for Approximate Nearest Neighbor Search on ARM
Yusuke Matsui, Yoshiki Imaizumi, Naoya Miyamoto, Naoki Yoshifuji

TL;DR
This paper introduces an ARM-optimized 4-bit product quantization method that significantly accelerates approximate nearest neighbor search by leveraging NEON SIMD instructions, achieving a tenfold speedup over naive implementations.
Contribution
The paper presents a novel ARM-specific SIMD acceleration technique for 4-bit PQ, overcoming x64 limitations and enabling efficient approximate nearest neighbor search on ARM devices.
Findings
Achieves 10x speedup over naive 4-bit PQ
Uses ARM NEON instructions with register bundling and shuffling
Demonstrates consistent performance improvements
Abstract
We accelerate the 4-bit product quantization (PQ) on the ARM architecture. Notably, the drastic performance of the conventional 4-bit PQ strongly relies on x64-specific SIMD register, such as AVX2; hence, we cannot yet achieve such good performance on ARM. To fill this gap, we first bundle two 128-bit registers as one 256-bit component. We then apply shuffle operations for each using the ARM-specific NEON instruction. By making this simple but critical modification, we achieve a dramatic speedup for the 4-bit PQ on an ARM architecture. Experiments show that the proposed method consistently achieves a 10x improvement over the naive PQ with the same accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Data Compression Techniques · Image Retrieval and Classification Techniques
