RSR-core: A High-Performance Engine for Low-Bit Matrix-Vector Multiplication

Mohsen Dehghankar; Abolfazl Asudeh

arXiv:2603.27462·cs.DS·March 31, 2026

RSR-core: A High-Performance Engine for Low-Bit Matrix-Vector Multiplication

Mohsen Dehghankar, Abolfazl Asudeh

PDF

1 Repo

TL;DR

RSR-core is a high-performance engine that accelerates low-bit matrix-vector multiplication for neural networks, enabling faster inference on CPU and GPU with significant speedups.

Contribution

It introduces RSR-core, an optimized low-level implementation of the RSR algorithm for efficient low-bit matrix-vector multiplication in inference pipelines.

Findings

01

Up to 62x speedup on CPU over baseline PyTorch multiplication.

02

Up to 1.9x speedup for token generation on CUDA.

03

Supports binary and ternary weight matrices with practical deployment.

Abstract

Matrix-vector multiplication is a fundamental building block in neural networks, vector databases, and large language models, particularly during inference. As a result, efficient matrix-vector multiplication engines directly translate into more efficient inference. Recent work has explored low-bit quantization of model weights, where matrices are represented using binary (1-bit) or ternary (1.58-bit) values while activation is kept in higher precision. These representations enable efficient hardware-level computation. In parallel, algorithms such as Redundant Segment Reduction (RSR) provide theoretical guarantees for accelerating low-bit matrix-vector multiplication. However, existing implementations operate at the application level and cannot be efficiently integrated into hardware kernels, limiting practical performance. To bridge this gap, we present RSR-core, a high-performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UIC-InDeXLab/RSR-core
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.