IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout

Fei Shen; Yutong Gao; Jian Yu; Xiaoyu Du; Jinhui Tang

arXiv:2506.01949·cs.CV·March 30, 2026

IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout

Fei Shen, Yutong Gao, Jian Yu, Xiaoyu Du, Jinhui Tang

PDF

1 Repo 1 Models

TL;DR

IMAGHarmony is a novel diffusion-based image editing framework that preserves object counts and spatial layouts during multi-object scene modifications, ensuring structural and semantic consistency.

Contribution

It introduces a harmony-aware module and a preference-guided noise strategy, enabling efficient, structure-preserving multi-object image editing with minimal training data.

Findings

01

Outperforms existing methods in structural preservation and semantic accuracy.

02

Requires only 200 training images and 10.6M trainable parameters.

03

Provides a new benchmark, HarmonyBench, for evaluating multi-object editing.

Abstract

Despite advances in diffusion-based image editing, manipulating multi-object scenes remains challenging. Existing approaches often achieve semantic changes at the expense of structural consistency, failing to preserve exact object counts and spatial layouts without introducing unintended relocations or background modifications. To address this limitation, we introduce quantity-and-layout-consistent image editing (QL-Edit) to modify object semantics while maintaining the original instance cardinality and spatial layout. We propose IMAGHarmony, a parameter-efficient framework featuring a harmony-aware (HA) module that incorporates perception cues from the reference image into the diffusion process. This enables the model to jointly reason about object semantics, counts, and spatial positions for improved structural consistency. Furthermore, we introduce a preference-guided noise selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

muzishen/IMAGHarmony
github

Models

🤗
kkkkggg/IMAGHarmony
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.