LGCC: Enhancing Flow Matching Based Text-Guided Image Editing with Local Gaussian Coupling and Context Consistency
Fangbing Liu, Pengfei Duan, Wen Li, Yi He

TL;DR
LGCC introduces local Gaussian coupling and content consistency loss to improve detail preservation, semantic alignment, and efficiency in flow matching-based text-guided image editing, outperforming prior methods in speed and quality.
Contribution
The paper proposes LGCC, a novel framework that enhances flow matching-based image editing by integrating local Gaussian noise coupling and content consistency loss, reducing inference time and improving editing quality.
Findings
Improves local detail scores by 1.60% on I2EBench.
Achieves 3x to 5x speedup in lightweight editing.
Reduces inference time to 40-50% of BAGEL or Flux.
Abstract
Recent advancements have demonstrated the great potential of flow matching-based Multimodal Large Language Models (MLLMs) in image editing. However, state-of-the-art works like BAGEL face limitations, including detail degradation, content inconsistency, and inefficiency due to their reliance on random noise initialization. To address these issues, we propose LGCC, a novel framework with two key components: Local Gaussian Noise Coupling (LGNC) and Content Consistency Loss (CCL). LGNC preserves spatial details by modeling target image embeddings and their locally perturbed counterparts as coupled pairs, while CCL ensures semantic alignment between edit instructions and image modifications, preventing unintended content removal. By integrating LGCC with the BAGEL pre-trained model via curriculum learning, we significantly reduce inference steps, improving local detail scores on I2EBench by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques
