TL;DR
IMAGHarmony is a novel diffusion-based image editing framework that preserves object counts and spatial layouts during multi-object scene modifications, ensuring structural and semantic consistency.
Contribution
It introduces a harmony-aware module and a preference-guided noise strategy, enabling efficient, structure-preserving multi-object image editing with minimal training data.
Findings
Outperforms existing methods in structural preservation and semantic accuracy.
Requires only 200 training images and 10.6M trainable parameters.
Provides a new benchmark, HarmonyBench, for evaluating multi-object editing.
Abstract
Despite advances in diffusion-based image editing, manipulating multi-object scenes remains challenging. Existing approaches often achieve semantic changes at the expense of structural consistency, failing to preserve exact object counts and spatial layouts without introducing unintended relocations or background modifications. To address this limitation, we introduce quantity-and-layout-consistent image editing (QL-Edit) to modify object semantics while maintaining the original instance cardinality and spatial layout. We propose IMAGHarmony, a parameter-efficient framework featuring a harmony-aware (HA) module that incorporates perception cues from the reference image into the diffusion process. This enables the model to jointly reason about object semantics, counts, and spatial positions for improved structural consistency. Furthermore, we introduce a preference-guided noise selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
