BOA Constrictor: A Mamba-based lossless compressor for High Energy Physics data
Akshat Gupta, Caterina Doglioni, Thomas Joseph Elliott

TL;DR
This paper introduces BOA Constrictor, a novel lossless data compressor based on Mamba architecture, which significantly improves compression ratios for High Energy Physics data by leveraging state space models, though with current throughput limitations.
Contribution
The paper presents BOA Constrictor, the first Mamba-based lossless compressor tailored for HEP data, achieving state-of-the-art compression ratios through autoregressive modeling and streaming range coding.
Findings
Achieves 2.21x to 44.14x better compression ratios than LZMA.
Demonstrates effective modeling of structured HEP datasets.
Trade-off observed between compression ratio and throughput.
Abstract
The petabyte-scale data generated annually by High Energy Physics (HEP) experiments like those at the Large Hadron Collider present a significant data storage challenge. Whilst traditional algorithms like LZMA and ZLIB are widely used, they often fail to exploit the deep structure inherent in scientific data. We investigate the application of modern state space models (SSMs) to this problem, which have shown promise for capturing long-range dependencies in sequences. We present the Bytewise Online Autoregressive (BOA) Constrictor, a novel, streaming-capable lossless compressor built upon the Mamba architecture. BOA combines an autoregressive Mamba model for next-byte prediction with a parallelised streaming range coder. We evaluate our method on three distinct structured datasets in HEP, demonstrating state-of-the-art compression ratios, improving upon LZMA-9 across all datasets. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
