Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models
Xin Ding, Shijie Cao, Ting Cao, Zhibo Chen

TL;DR
This paper investigates how quantization affects vision generative models, revealing that language-style models have better bit-level scaling laws than diffusion-style models due to their discrete representation space, and proposes TopKLD to improve quantization performance.
Contribution
The study systematically compares quantization effects on diffusion and language-style vision models and introduces TopKLD to enhance their bit-level scaling laws.
Findings
Language-style models outperform diffusion models under quantization.
Discrete representation space contributes to better quantization tolerance.
TopKLD improves bit-level scaling laws across quantization methods.
Abstract
Vision generative models have recently made significant advancements along two primary paradigms: diffusion-style and language-style, both of which have demonstrated excellent scaling laws. Quantization is crucial for efficiently deploying these models, as it reduces memory and computation costs. In this work, we systematically investigate the impact of quantization on these two paradigms. Surprisingly, despite achieving comparable performance in full precision, language-style models consistently outperform diffusion-style models across various quantization settings. This observation suggests that language-style models have superior bit-level scaling laws, offering a better tradeoff between model quality and total bits. To dissect this phenomenon, we conduct extensive experiments and find that the primary reason is the discrete representation space of language-style models, which is…
Peer Reviews
Decision·Submitted to ICLR 2025
- The paper provides a comprehensive study of how quantization affects two major paradigms of vision generative models, which is crucial for deploying these models efficiently. The finding that language-style models have superior bit-level scaling laws compared to diffusion-style models, might also shed light on further model optimization and deployment. - The proposed TopKLD method for knowledge distillation during the quantization process is innovative and shows experimental promise in impr
- The major weakness of this work is the limited scoop. As both VAR and DiT are specific cases in diffusion and language-style vision generative models, their behavior may not apply to other types of vision generative models. Compared to the original paper about k-bit inference scaling laws, the model scope is relatively small, which makes the conclusion unclear to generalize to different model types. - The authors provide some analysis about the reason behind models' scaling behaviors and dis
1. The paper investigates bit-level scaling laws in quantized vision generative models, specifically comparing diffusion-style and language-style models. The authors find that while both models perform similarly in full precision, language-style models consistently exhibit superior bit-level scaling across various quantization settings. This robustness is attributed to the discrete representation space of language-style models, which enhances resilience to quantization noise. 2. The authors pro
1. Inconsistent Scaling Comparison in Figure 1: The paper aims to show that language-style models have superior bit-level scaling compared to diffusion-style models. However, the models compared in Figure 1 have different initial total model bits and compute bits, which may itself cause scaling variations. This discrepancy introduces an additional variable that weakens the effectiveness of Figure 1 in supporting the authors’ claim. Aligning initial bit settings could help provide a clearer, more
This paper demonstrates the bit-level scaling laws of image generative models through comprehensive experiments in terms of model bits and compute bits. By analysis of the reconstruction error of middle representations in VAR and DiT, the paper draws the conclusion that VAR is more robust to quantization and could generalize to other discrete auto-regressive models. And further, the paper proposes TopKLD, a quantization-aware training process, to improve scaling behavior of VAR at low bits regio
Bit-level scaling laws and the robustness of discrete auto-regressive models seem to be intuitive and straightforward, therefore the main contribution of this paper is the proposed quantization method, TopKLD. As a knowledge distillation based quantization-aware training method, the comparison and ablation studies are not enough.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Evolutionary Algorithms and Applications
