Edit3r: Instant 3D Scene Editing from Sparse Unposed Images

Jiageng Liu; Weijie Lyu; Xueting Li; Yejie Guo; Ming-Hsuan Yang

arXiv:2512.25071·cs.CV·January 1, 2026

Edit3r: Instant 3D Scene Editing from Sparse Unposed Images

Jiageng Liu, Weijie Lyu, Xueting Li, Yejie Guo, Ming-Hsuan Yang

PDF

Open Access

TL;DR

Edit3r is a fast, feed-forward framework that enables instant 3D scene editing from sparse, unposed images by predicting instruction-aligned edits without requiring scene-specific optimization.

Contribution

It introduces a novel training strategy with SAM2-based recoloring and asymmetric input pairing to enable 3D editing from unposed images without multi-view supervision.

Findings

01

Achieves superior semantic alignment and 3D consistency.

02

Operates at significantly higher inference speed.

03

Effective at handling 2D-edited images during inference.

Abstract

We present Edit3r, a feed-forward framework that reconstructs and edits 3D scenes in a single pass from unposed, view-inconsistent, instruction-edited images. Unlike prior methods requiring per-scene optimization, Edit3r directly predicts instruction-aligned 3D edits, enabling fast and photorealistic rendering without optimization or pose estimation. A key challenge in training such a model lies in the absence of multi-view consistent edited images for supervision. We address this with (i) a SAM2-based recoloring strategy that generates reliable, cross-view-consistent supervision, and (ii) an asymmetric input strategy that pairs a recolored reference view with raw auxiliary views, encouraging the network to fuse and align disparate observations. At inference, our model effectively handles images edited by 2D methods such as InstructPix2Pix, despite not being exposed to such edits during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging