OmniZip: Learning a Unified and Lightweight Lossless Compressor for Multi-Modal Data
Yan Zhao, Zhengxue Cheng, Junxuan Zhang, Dajiang Zhou, Qunshan Gu, Qi Wang, Li Song

TL;DR
OmniZip is a lightweight, unified lossless compressor designed for various multi-modal data types, achieving high compression efficiency and real-time performance on edge devices.
Contribution
It introduces a novel unified architecture with modality-unified tokenization and routing mechanisms, enabling effective multi-modal lossless compression in a lightweight model.
Findings
Outperforms state-of-the-art compressors on multiple datasets.
Achieves 42-62% higher compression efficiency than gzip.
Supports near real-time inference on resource-constrained devices.
Abstract
Lossless compression is essential for efficient data storage and transmission. Although learning-based lossless compressors achieve strong results, most of them are designed for a single modality, leading to redundant compressor deployments in multi-modal settings. Designing a unified multi-modal compressor is critical yet challenging, as different data types vary largely in format, dimension, and statistics. Multi-modal large language models offer a promising resolution but remain too complex for practical use. Thus, we propose \textbf{OmniZip}, \textbf{a unified and lightweight lossless compressor for multi-modal data (like image, text, speech, tactile, database, and gene sequence)}. Built on a lightweight backbone, OmniZip incorporates three key components to enable efficient multi-modal lossless compression: a modality-unified tokenizer that reversibly transforms diverse data into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
