TStore: Rethinking AI Model Hub with Tensor-Centric Compression

Tingfeng Lan; Zirui Wang; Yunjia Zheng; Zhaoyuan Su; Juncheng Yang; and Yue Cheng

arXiv:2604.17104·cs.DC·May 14, 2026

TStore: Rethinking AI Model Hub with Tensor-Centric Compression

Tingfeng Lan, Zirui Wang, Yunjia Zheng, Zhaoyuan Su, Juncheng Yang, and Yue Cheng

PDF

TL;DR

TStore introduces a tensor-centric system that reduces AI model storage by identifying redundancy through tensor-level fingerprinting and clustering, enabling efficient compression without sacrificing model performance.

Contribution

The paper presents TStore, a novel tensor-centric approach for fine-grained deduplication and compression in AI model hubs, addressing storage challenges.

Findings

01

Achieves significant storage savings in real-world model repositories.

02

Maintains model usability and performance after compression.

03

Introduces tensor-level fingerprinting and clustering for redundancy detection.

Abstract

Modern AI models are growing rapidly in size and redundancy, leading to significant storage and distribution challenges in model hubs. We present TStore, a tensor-centric system for reducing storage overhead through fine-grained deduplication and compression. TStore leverages tensor-level fingerprinting and clustering to identify redundancy across models without requiring annotations. Our design enables efficient storage reduction while preserving model usability and performance. Experiments on real-world model repositories demonstrate substantial storage savings with minimal overhead.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.