VENUS: Visual Editing with Noise Inversion Using Scene Graphs

Thanh-Nhan Vo; Trong-Thuan Nguyen; Tam V. Nguyen; and Minh-Triet Tran

arXiv:2601.07219·cs.CV·January 13, 2026

VENUS: Visual Editing with Noise Inversion Using Scene Graphs

Thanh-Nhan Vo, Trong-Thuan Nguyen, Tam V. Nguyen, and Minh-Triet Tran

PDF

Open Access

TL;DR

VENUS is a training-free, scene graph-guided image editing framework that improves fidelity, semantic consistency, and efficiency by leveraging noise inversion and multimodal large language models without additional training.

Contribution

VENUS introduces a novel, training-free approach for scene graph-guided image editing that combines noise inversion with multimodal large language models, enhancing controllability and scalability.

Findings

01

Improves background preservation and semantic alignment on PIE-Bench.

02

Reduces per-image runtime from 6-10 minutes to 20-30 seconds.

03

Outperforms state-of-the-art scene graph and text-based editing methods.

Abstract

State-of-the-art text-based image editing models often struggle to balance background preservation with semantic consistency, frequently resulting either in the synthesis of entirely new images or in outputs that fail to realize the intended edits. In contrast, scene graph-based image editing addresses this limitation by providing a structured representation of semantic entities and their relations, thereby offering improved controllability. However, existing scene graph editing methods typically depend on model fine-tuning, which incurs high computational cost and limits scalability. To this end, we introduce VENUS (Visual Editing with Noise inversion Using Scene graphs), a training-free framework for scene graph-guided image editing. Specifically, VENUS employs a split prompt conditioning strategy that disentangles the target object of the edit from its background context, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship