Investigating the Fundamental Limit: A Feasibility Study of Hybrid-Neural Archival

Marcus Armstrong; ZiWei Qiu; Huy Q. Vo; Arjun Mukherjee

arXiv:2603.25526·cs.IT·March 27, 2026

Investigating the Fundamental Limit: A Feasibility Study of Hybrid-Neural Archival

Marcus Armstrong, ZiWei Qiu, Huy Q. Vo, Arjun Mukherjee

PDF

Open Access

TL;DR

This study explores the potential of Large Language Models for lossless data compression, introducing a novel architecture and addressing hardware non-determinism to measure neural compression capabilities.

Contribution

The paper presents Hybrid-LLM, a proof-of-concept system, and introduces a logit quantization protocol to measure neural compression rates, addressing deployment barriers.

Findings

01

LLMs achieve 0.39 BPC on memorized data

02

LLMs achieve 0.75 BPC on unseen data

03

Inference latency is significantly higher than classical methods

Abstract

Large Language Models (LLMs) possess a theoretical capability to model information density far beyond the limits of classical statistical methods (e.g., Lempel-Ziv). However, utilizing this capability for lossless compression involves navigating severe system constraints, including non-deterministic hardware and prohibitive computational costs. In this work, we present an exploratory study into the feasibility of LLM-based archival systems. We introduce \textbf{Hybrid-LLM}, a proof-of-concept architecture designed to investigate the "entropic capacity" of foundation models in a storage context. \textbf{We identify a critical barrier to deployment:} the "GPU Butterfly Effect," where microscopic hardware non-determinism precludes data recovery. We resolve this via a novel logit quantization protocol, enabling the rigorous measurement of neural compression rates on real-world data. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Big Data and Digital Economy · Data Quality and Management