LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models
Shu Yu, Chaochao Lu

TL;DR
LINA is a novel framework that improves diffusion models' physical alignment and out-of-distribution instruction following by learning prompt-specific interventions and causality-aware denoising schedules.
Contribution
It introduces a causal analysis framework and a new intervention learning method to enhance diffusion models' reasoning and generalization capabilities.
Findings
Achieves state-of-the-art results on causal generation tasks.
Improves out-of-distribution instruction following.
Enhances physical alignment in image and video generation.
Abstract
Diffusion models (DMs) have achieved remarkable success in image and video generation. However, they still struggle with (1) physical alignment and (2) out-of-distribution (OOD) instruction following. We argue that these issues stem from the models' failure to learn causal directions and to disentangle causal factors for novel recombination. We introduce the Causal Scene Graph (CSG) and the Physical Alignment Probe (PAP) dataset to enable diagnostic interventions. This analysis yields three key insights. First, DMs struggle with multi-hop reasoning for elements not explicitly determined in the prompt. Second, the prompt embedding contains disentangled representations for texture and physics. Third, visual causal structure is disproportionately established during the initial, computationally limited denoising steps. Based on these findings, we introduce LINA (Learning INterventions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
