REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion
Giorgos Petsangourakis, Christos Sgouropoulos, Bill Psomas, Theodoros Giannakopoulos, Giorgos Sfikas, Ioannis Kakogeorgiou

TL;DR
REGLUE introduces a unified framework for latent diffusion models that jointly models image, local, and global semantics, leading to improved image synthesis quality and faster convergence by effectively utilizing rich VFM features.
Contribution
It proposes REGLUE, a novel joint modeling approach that entangles VAE latents with multi-layer VFM semantics using a lightweight compressor and external alignment, enhancing diffusion model performance.
Findings
Improves FID scores on ImageNet 256x256.
Accelerates convergence compared to baseline models.
Highlights importance of spatial semantics and non-linear compression.
Abstract
Latent diffusion models (LDMs) achieve state-of-the-art image synthesis, yet their reconstruction-style denoising objective provides only indirect semantic supervision: high-level semantics emerge slowly, requiring longer training and limiting sample quality. Recent works inject semantics from Vision Foundation Models (VFMs) either externally via representation alignment or internally by jointly modeling only a narrow slice of VFM features inside the diffusion process, under-utilizing the rich, nonlinear, multi-layer spatial semantics available. We introduce REGLUE (Representation Entanglement with Global-Local Unified Encoding), a unified latent diffusion framework that jointly models (i) VAE image latents, (ii) compact local (patch-level) VFM semantics, and (iii) a global (image-level) [CLS] token within a single SiT backbone. A lightweight convolutional semantic compressor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Computer Graphics and Visualization Techniques
