ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction
Hyungjin Chung, Dohun Lee, Jong Chul Ye

TL;DR
ACDC is a novel zero-shot method that combines autoregressive and diffusion models to improve long-sequence multimodal generation by correcting artifacts and preserving global context without additional training.
Contribution
The paper introduces a memory-augmented approach that integrates ARMs and DMs at inference time, enabling high-quality, coherent multimodal generation without fine-tuning.
Findings
Effective error mitigation in long-sequence multimodal generation
Significant quality improvements over baseline models
Versatile across different architectures and tasks
Abstract
Autoregressive models (ARMs) and diffusion models (DMs) represent two leading paradigms in generative modeling, each excelling in distinct areas: ARMs in global context modeling and long-sequence generation, and DMs in generating high-quality local contexts, especially for continuous data such as images and short videos. However, ARMs often suffer from exponential error accumulation over long sequences, leading to physically implausible results, while DMs are limited by their local context generation capabilities. In this work, we introduce Autoregressive Coherent multimodal generation with Diffusion Correction (ACDC), a zero-shot approach that combines the strengths of both ARMs and DMs at the inference stage without the need for additional fine-tuning. ACDC leverages ARMs for global context generation and memory-conditioned DMs for local correction, ensuring high-quality outputs by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic Bearings and Levitation Dynamics
MethodsDiffusion
