TL;DR
This paper presents an FPGA-based Top-K SpMV design optimized for approximate embedding similarity search, achieving significant speed and power efficiency improvements over CPU and GPU implementations.
Contribution
The paper introduces a novel FPGA architecture for Top-K SpMV that uses reduced precision and packet-wise CSR compression to enhance bandwidth efficiency.
Findings
100x faster than multi-threaded CPU implementation
2x faster than GPU with higher bandwidth
14.2x higher power-efficiency
Abstract
Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-K SpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power-efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
