Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion Models
Rowan Bradbury, Dazhi Zhong

TL;DR
This paper introduces Pixel-Equivalent Latent Compositing (PELC), a method that enables high-fidelity, full-resolution mask control in diffusion model inpainting by ensuring latent compositing matches pixel-space results.
Contribution
The paper proposes PELC and DecFormer, a transformer-based model that achieves pixel-equivalent latent compositing, improving inpainting quality without extensive retraining or backbone modifications.
Findings
DecFormer reduces edge error metrics by up to 53% compared to standard methods.
DecFormer achieves high-fidelity inpainting comparable to fully finetuned models.
PELC enables general pixel-equivalent latent editing beyond inpainting.
Abstract
Latent inpainting in diffusion models still relies almost universally on linearly interpolating VAE latents under a downsampled mask. We propose a key principle for compositing image latents: Pixel-Equivalent Latent Compositing (PELC). An equivalent latent compositor should be the same as compositing in pixel space. This principle enables full-resolution mask control and true soft-edge alpha compositing, even though VAEs compress images 8x spatially. Modern VAEs capture global context beyond patch-aligned local structure, so linear latent blending cannot be pixel-equivalent: it produces large artifacts at mask seams and global degradation and color shifts. We introduce DecFormer, a 7.7M-parameter transformer that predicts per-channel blend weights and an off-manifold residual correction to realize mask-consistent latent fusion. DecFormer is trained so that decoding after fusion matches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neuroimaging Techniques and Applications · Model Reduction and Neural Networks
