Adapting VACE for Real-Time Autoregressive Video Diffusion
Ryan Fosdick (Daydream)

TL;DR
This paper adapts the VACE video generation model for real-time autoregressive applications by modifying its attention mechanism, enabling streaming video synthesis with minimal latency increase and no additional training.
Contribution
It introduces a novel adaptation of VACE that moves reference frames into a parallel conditioning pathway, maintaining fixed chunk sizes and causal attention without retraining.
Findings
20-30% latency increase for structural control and inpainting
Negligible VRAM overhead compared to base model
Degraded reference-to-video fidelity due to causal attention constraints
Abstract
We describe an adaptation of VACE (Video All-in-one Creation and Editing) for real-time autoregressive video generation. VACE provides unified video control (reference guidance, structural conditioning, inpainting, and temporal extension) but assumes bidirectional attention over full sequences, making it incompatible with streaming pipelines that require fixed chunk sizes and causal attention. The key modification moves reference frames from the diffusion latent space into a parallel conditioning pathway, preserving the fixed chunk sizes and KV caching that autoregressive models require. This adaptation reuses existing pretrained VACE weights without additional training. Across 1.3B and 14B model scales, VACE adds 20-30% latency overhead for structural control and inpainting, with negligible VRAM cost relative to the base model. Reference-to-video fidelity is severely degraded compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies · Human Pose and Action Recognition
