CASCADE: Context-Aware Relaxation for Speculative Image Decoding
Selin Yildirim, Subhajit Dutta Chowdhury, Mohammad Mahdi Kamani, Vikram Appia, Deming Chen

TL;DR
CASCADE introduces a novel context-aware relaxation method for speculative image decoding, leveraging redundancies in model representations to significantly accelerate image synthesis without quality loss.
Contribution
It formalizes properties like semantic interchangeability and convergence to enable principled acceptance relaxation in speculative decoding, improving efficiency without extra training.
Findings
Achieves up to 3.6x speedup in image decoding.
Maintains image quality and fidelity during acceleration.
Demonstrates effectiveness across multiple models and architectures.
Abstract
Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing approaches fail to achieve efficiency gains comparable to those observed in text generation. A key limitation is the target model's high uncertainty during image generation, which leads to high draft token rejection rates. In this work, we identify previously overlooked patterns in the target model's behavior that emerge naturally in tree-based speculative decoding. Specifically, we formalize two properties, semantic interchangeability and convergence, arising from the redundancies in the target model's hidden state representations. By capturing these redundancies across the depth and breadth of the predicted token tree, our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
