V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising
Han Lin, Xichen Pan, Zun Wang, Yue Zhang, Chu Wang, Jaemin Cho, Mohit Bansal

TL;DR
V-Co systematically studies visual co-denoising in pixel-space diffusion models, identifying four key ingredients that improve semantic alignment and outperform baseline methods on ImageNet-256.
Contribution
This paper introduces a unified framework for visual co-denoising, clarifies essential design choices, and provides a practical recipe for enhancing diffusion models with visual feature alignment.
Findings
V-Co outperforms baseline pixel-space diffusion models on ImageNet-256.
Four key ingredients are identified for effective visual co-denoising.
V-Co achieves better results with fewer training epochs.
Abstract
Pixel-space diffusion has recently re-emerged as a strong alternative to latent diffusion, enabling high-quality generation without pretrained autoencoders. However, standard pixel-space diffusion models receive relatively weak semantic supervision and are not explicitly designed to capture high-level visual structure. Recent representation-alignment methods (e.g., REPA) suggest that pretrained visual features can substantially improve diffusion training, and visual co-denoising has emerged as a promising direction for incorporating such features into the generative process. However, existing co-denoising approaches often entangle multiple design choices, making it unclear which design choices are truly essential. Therefore, we present V-Co, a systematic study of visual co-denoising in a unified JiT-based framework. This controlled setting allows us to isolate the ingredients that make…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Image Enhancement Techniques
