Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, Xinchao Wang

TL;DR
This paper introduces Collaborative Decoding (CoDe), a strategy that significantly improves the efficiency of Visual Auto-Regressive models by splitting inference between a large and small model, reducing memory and computation with minimal quality loss.
Contribution
The paper proposes a novel collaborative decoding method for VAR models that enhances efficiency by leveraging scale-specific generation roles between two models.
Findings
Achieves 1.7x speedup and 50% memory reduction with minimal quality loss.
Further decreases in drafting steps lead to 2.9x acceleration at 41 images/sec.
Maintains high image quality with negligible FID increase.
Abstract
In the rapidly advancing field of image generation, Visual Auto-Regressive (VAR) modeling has garnered considerable attention for its innovative next-scale prediction approach. This paradigm offers substantial improvements in efficiency, scalability, and zero-shot generalization. Yet, the inherently coarse-to-fine nature of VAR introduces a prolonged token sequence, leading to prohibitive memory consumption and computational redundancies. To address these bottlenecks, we propose Collaborative Decoding (CoDe), a novel efficient decoding strategy tailored for the VAR framework. CoDe capitalizes on two critical observations: the substantially reduced parameter demands at larger scales and the exclusive generation patterns across different scales. Based on these insights, we partition the multi-scale inference process into a seamless collaboration between a large model and a small model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Brain Tumor Detection and Classification
MethodsSoftmax · Attention Is All You Need
