CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
Keming Ye, Zhou Zhao, Fan Wu, Shengyu Zhang

TL;DR
CIAR is a collaborative cloud-device framework that accelerates auto-regressive image generation by using interval-based uncertainty quantification and decoding, significantly reducing cloud requests and processing time without sacrificing quality.
Contribution
This work introduces an interval-based uncertainty quantification and decoding approach for AR models, enabling efficient on-device processing and reducing cloud dependency in image generation.
Findings
Achieves 2.18x speed-up in image generation
Reduces cloud requests by 70%
Maintains high image quality and semantic consistency
Abstract
Auto-regressive (AR) models have recently made notable progress in image generation, achieving performance comparable to diffusion-based approaches. However, their computational intensity and sequential nature impede on-device deployment, causing disruptive latency. We address this via a cloud-device collaboration framework \textbf{CIAR}, which utilizes on-device self-verification to handle two key properties of visual synthesis: \textit{the vast token vocabulary} required for high-fidelity images and \textit{inherent spatial redundancy} which leads to extreme predictability in homogeneous regions, while object boundaries exhibit high uncertainty. Uniform verification wastes resources on such redundant tokens. Our solution centers on an on-device token uncertainty quantifier, which adopts continuous probability intervals to accelerate processing and make it feasible for large visual…
Peer Reviews
Decision·ICLR 2026 Poster
The paper clearly identifies bottlenecks in AR-based image generation for on-device deployment, specifically the challenge of excessive cloud requests and inefficiency of uniform token verification, and motivates a focused solution. The introduction of interval-based uncertainty (via the “Inter-Head”) in the device model is a creative step toward efficiently identifying tokens that can be safely accepted on-device, addressing the challenge posed by large visual vocabularies and spatial redundan
Despite the strong baseline coverage, the paper omits a direct comparison and discussion of several directly relevant. In particular, work such as “Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient” (Chen et al., 2025) is highly relevant and should be both cited and empirically compared (e.g., as a baseline in Tables 1 and 2, and in distributional alignment discussions). The failure to engage with such work weakens the claim of CIAR’s advancement. While the role of prefix r
1. This article is written clearly and logically. 2. This article analyzes the challenges of deploying autoregressive image generation models on devices and proposes an edge-cloud collaborative framework. 3. The proposed uncertainty estimation method is interesting. 4. Comprehensive experiments demonstrate the superiority of the proposed method compared to other acceleration baselines and its advantage over other uncertainty estimation methods.
1. The paper lacks details on efficiency measurement. What hardware was used for latency and speedup testing? This seems to be unmentioned in the paper. 2. As an edge-cloud collaborative method, what is the communication cost of CIAR? Are there any latency tests conducted by deploying CIAR in a real edge-cloud collaborative scenario? 3. Can the proposed method also be applied to the VAR model of next-scale prediction?
1. Practical Impact: The method effectively addresses key bottlenecks in on-device AR image generation (network latency, computational cost) and offers a deployable solution. 2. Comprehensive Evaluation: Extensive experiments across multiple models (LlamaGen, Anole) and metrics (FID, CLIP, speed-up) convincingly demonstrate the advantages over strong baselines. 3. Theoretical Grounding: The appendix provides rigorous mathematical justification for the uncertainty metric and interval fusion ope
1. Ablation Clarity: While ablation studies are included, the relative contribution of each component (Inter-Head, prefix injection, Inter-DRO loss) to the overall performance could be more clearly disentangled. 2. Generalization: Experiments are limited to specific AR architectures (LlamaGen, Anole); it's unclear how well CIAR generalizes to other AR or non-AR generative models.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Cell Image Analysis Techniques
