VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

Kaixin Zhu; Yiwen Tang; Yifan Yang; Renrui Zhang; Bohan Zeng; Ziyu Guo; Ruichuan An; Zhou Liu; Qizhi Chen; Delin Qu; Jaehong Yoon; Wentao Zhang

arXiv:2605.15186·cs.CV·May 20, 2026

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

Kaixin Zhu, Yiwen Tang, Yifan Yang, Renrui Zhang, Bohan Zeng, Ziyu Guo, Ruichuan An, Zhou Liu, Qizhi Chen, Delin Qu, Jaehong Yoon, Wentao Zhang

PDF

1 Repo 1 Datasets

TL;DR

VGGT-Edit is a novel feed-forward 3D scene editing framework that uses residual field prediction and depth-synchronized text guidance to enable high-fidelity, consistent, and interactive scene modifications in a single forward pass.

Contribution

It introduces a residual transformation head with depth-synchronized text injection for native 3D scene editing, surpassing 2D-lifting methods in quality and efficiency.

Findings

01

Outperforms 2D-lifting baselines in detail and consistency.

02

Achieves near-instant inference speed.

03

Produces sharper object details and better multi-view consistency.

Abstract

High-quality 3D scene reconstruction has recently advanced toward generalizable feed-forward architectures, enabling the generation of complex environments in a single forward pass. However, despite their strong performance in static scene perception, these models remain limited in responding to dynamic human instructions, which restricts their use in interactive applications. Existing editing methods typically rely on a 2D-lifting strategy, where individual views are edited independently and then lifted back into 3D space. This indirect pipeline often leads to blurry textures and inconsistent geometry, as 2D editors lack the spatial awareness required to preserve structure across viewpoints. To address these limitations, we propose VGGT-Edit, a feed-forward framework for text-conditioned native 3D scene editing. VGGT-Edit introduces depth-synchronized text injection to align semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://chriszkxxx.github.io/VGGT-Edit
github

Datasets

aoiandroid/papers
dataset· 55 dl
55 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.