Multi-Scale Local Speculative Decoding for Image Generation
Elia Peruzzo, Guillaume Sauti\`ere, Amirhossein Habibian

TL;DR
This paper introduces MuLo-SD, a multi-scale speculative decoding framework that accelerates autoregressive image generation by combining low-resolution drafting with spatially aware verification, achieving up to 1.7x speedup while maintaining quality.
Contribution
The paper presents a novel multi-scale speculative decoding method with local rejection and resampling, improving speed and accuracy in autoregressive image synthesis.
Findings
Achieves up to 1.7x speedup over baselines.
Maintains semantic and perceptual quality comparable to existing methods.
Validated on MS-COCO dataset using multiple evaluation metrics.
Abstract
Autoregressive (AR) models have achieved remarkable success in image synthesis, yet their sequential nature imposes significant latency constraints. Speculative Decoding offers a promising avenue for acceleration, but existing approaches are limited by token-level ambiguity and lack of spatial awareness. In this work, we introduce Multi-Scale Local Speculative Decoding (MuLo-SD), a novel framework that combines multi-resolution drafting with spatially informed verification to accelerate AR image generation. Our method leverages a low-resolution drafter paired with learned up-samplers to propose candidate image tokens, which are then verified in parallel by a high-resolution target model. Crucially, we incorporate a local rejection and resampling mechanism, enabling efficient correction of draft errors by focusing on spatial neighborhoods rather than raster-scan resampling after the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Digital Media Forensic Detection
