Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
Jiyuan Wang, Chunyu Lin, Lei Sun, Zhi Cao, Yuyang Yin, Lang Nie, Zhenlong Yuan, Xiangxiang Chu, Yunchao Wei, Kang Liao, Guosheng Lin

TL;DR
This paper introduces RL3DEdit, a reinforcement learning framework that uses 3D priors from a foundation model to achieve multi-view consistent 3D scene editing without requiring extensive 3D-consistent paired data.
Contribution
It proposes a novel RL-based approach leveraging 3D priors for consistent 3D editing, addressing the challenge of multi-view consistency without supervised fine-tuning.
Findings
Achieves stable multi-view consistency in 3D editing.
Outperforms state-of-the-art methods in editing quality.
Demonstrates high efficiency in 3D scene editing.
Abstract
Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm. However, maintaining multi-view consistency in edited results remains challenging, and the extreme scarcity of 3D-consistent editing paired data renders supervised fine-tuning (SFT), the most effective training strategy for editing tasks, infeasible. In this paper, we observe that, while generating multi-view consistent 3D content is highly challenging, verifying 3D consistency is tractable, naturally positioning reinforcement learning (RL) as a feasible solution. Motivated by this, we propose \textbf{RL3DEdit}, a single-pass framework driven by RL optimization with novel rewards derived from the 3D foundation model, VGGT. Specifically, we leverage VGGT's robust priors learned from massive real-world data, feed the edited images, and utilize the output confidence maps and pose estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
