MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection

Shuyu Wang; Weiqi Li; Qian Wang; Shijie Zhao; Jian Zhang

arXiv:2505.19149·cs.CV·May 27, 2025

MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection

Shuyu Wang, Weiqi Li, Qian Wang, Shijie Zhao, Jian Zhang

PDF

Open Access

TL;DR

MIND-Edit is an innovative image editing framework that combines pretrained diffusion models with multimodal large language models to improve semantic accuracy and visual coherence in complex editing tasks.

Contribution

It introduces a novel end-to-end approach that leverages MLLM's visual understanding and semantic reasoning for more precise and semantically aligned image edits.

Findings

01

Outperforms state-of-the-art methods in quantitative metrics.

02

Achieves more visually coherent edits in complex scenarios.

03

Enhances instruction interpretation accuracy.

Abstract

Recent advances in AI-generated content (AIGC) have significantly accelerated image editing techniques, driving increasing demand for diverse and fine-grained edits. Despite these advances, existing image editing methods still face challenges in achieving high precision and semantic accuracy in complex scenarios. Recent studies address this issue by incorporating multimodal large language models (MLLMs) into image editing pipelines. However, current MLLM-based methods mainly rely on interpreting textual instructions, leaving the intrinsic visual understanding of large models largely unexplored, thus resulting in insufficient alignment between textual semantics and visual outcomes. To overcome these limitations, we propose MIND-Edit, an end-to-end image-editing framework integrating pretrained diffusion model with MLLM. MIND-Edit introduces two complementary strategies: (1) a text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsDiffusion