Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

Hengyuan Cao; Yutong Feng; Biao Gong; Yijing Tian; Yunhong Lu; Chuang Liu; and Bin Wang

arXiv:2505.23325·cs.CV·May 30, 2025

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

Hengyuan Cao, Yutong Feng, Biao Gong, Yijing Tian, Yunhong Lu, Chuang Liu, and Bin Wang

PDF

1 Models 1 Video

TL;DR

This paper introduces a novel method called DRA-Ctrl that repurposes video generative models for controllable image synthesis, leveraging their ability to model dynamic scenes for high-quality, lower-dimensional image tasks.

Contribution

The paper proposes a new paradigm for video-to-image knowledge transfer, including a mixup transition strategy and a tailored attention mechanism, enabling video models to excel in image generation tasks.

Findings

01

Video models outperform image-trained models in controllable image synthesis.

02

The proposed method achieves smooth transition from video to image generation.

03

Repurposed video models demonstrate untapped potential for diverse visual tasks.

Abstract

Video generative models can be regarded as world simulators due to their ability to capture dynamic, continuous changes inherent in real-world environments. These models integrate high-dimensional information across visual, temporal, spatial, and causal dimensions, enabling predictions of subjects in various status. A natural and valuable research direction is to explore whether a fully trained video generative model in high-dimensional space can effectively support lower-dimensional tasks such as controllable image generation. In this work, we propose a paradigm for video-to-image knowledge compression and task adaptation, termed \textit{Dimension-Reduction Attack} (\texttt{DRA-Ctrl}), which utilizes the strengths of video models, including long-range context modeling and flatten full-attention, to perform various generation tasks. Specially, to address the challenging gap between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Kunbyte/DRA-Ctrl
model· 226 dl· ♡ 18
226 dl♡ 18

Videos

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis· slideslive

Taxonomy

MethodsSoftmax · Attention Is All You Need · ALIGN