RAS: A Bit-Exact rANS Accelerator For High-Performance Neural Lossless Compression
Yuchao Qin, Anjunyi Fan, Bonan Yan

TL;DR
RAS is a specialized hardware architecture that accelerates neural lossless compression using rANS, achieving significant speedups and maintaining bit-exactness, thus enabling practical high-performance data center compression.
Contribution
The paper introduces RAS, a hardware accelerator for rANS that integrates probabilistic models, reduces logic and memory costs, and scales throughput for neural lossless compression.
Findings
121.2x encode speedup over Python rANS
70.9x decode speedup over Python rANS
Higher compression ratios with neural models
Abstract
Data centers handle vast volumes of data that require efficient lossless compression, yet emerging probabilistic models based methods are often computationally slow. To address this, we introduce RAS, the Range Asymmetric Numeral System Acceleration System, a hardware architecture that integrates the rANS algorithm into a lossless compression pipeline and eliminates key bottlenecks. RAS couples an rANS core with a probabilistic generator, storing distributions in BF16 format and converting them once into a fixed-point domain shared by a unified division/modulo datapath. A two-stage rANS update with byte-level re-normalization reduces logic cost and memory traffic, while a prediction-guided decoding path speculatively narrows the cumulative distribution function (CDF) search window and safely falls back to maintain bit-exactness. A multi-lane organization scales throughput and enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Parallel Computing and Optimization Techniques · Embedded Systems Design Techniques
