QStore: Quantization-Aware Compressed Model Storage

Raunak Shah; Zhaoheng Li; Yongjoo Park

arXiv:2505.04081·cs.DB·October 21, 2025

QStore: Quantization-Aware Compressed Model Storage

Raunak Shah, Zhaoheng Li, Yongjoo Park

PDF

Open Access 1 Repo

TL;DR

QStore is a lossless compression format that efficiently stores multi-precision models by saving only residual information, significantly reducing storage costs and maintaining fast load times for both low and high precision models.

Contribution

QStore introduces a unified, lossless compression method that stores low-precision models and residuals to reconstruct high-precision models, saving storage without sacrificing speed.

Findings

01

Reduces storage footprint by up to 2.2x (45%)

02

Enables up to 1.7x faster model saving

03

Enables up to 1.8x faster model loading

Abstract

Modern applications commonly leverage large, multi-modal foundation models. These applications often feature complex workflows that demand the storage and usage of similar models in multiple precisions. A straightforward approach is to maintain a separate file for each model precision (e.g., INT8, BF16), which is indeed the approach taken by many model providers such as HuggingFace and Ollama. However, this approach incurs excessive storage costs since a higher precision model (e.g., BF16) is a strict superset of a lower precision model (e.g., INT8) in terms of information. Unfortunately, simply maintaining only the higher-precision model and requiring every user to dynamically convert the model precision is not desirable because every user of lower precision models must pay the cost for model download and precision conversion. In this paper, we present QStore, a unified, lossless…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

illinoisdata/qstore
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Advanced Database Systems and Queries