Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding   Similarity on FPGAs

Alberto Parravicini; Luca Giuseppe Cellamare; Marco Siracusa; Marco; Domenico Santambrogio

arXiv:2103.04808·cs.AR·March 9, 2021

Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs

Alberto Parravicini, Luca Giuseppe Cellamare, Marco Siracusa, Marco, Domenico Santambrogio

PDF

2 Repos

TL;DR

This paper presents an FPGA-based Top-K SpMV design optimized for approximate embedding similarity search, achieving significant speed and power efficiency improvements over CPU and GPU implementations.

Contribution

The paper introduces a novel FPGA architecture for Top-K SpMV that uses reduced precision and packet-wise CSR compression to enhance bandwidth efficiency.

Findings

01

100x faster than multi-threaded CPU implementation

02

2x faster than GPU with higher bandwidth

03

14.2x higher power-efficiency

Abstract

Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-K SpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power-efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.