UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images

Qijian Tian; Xin Tan; Jingyu Gong; Yuan Xie; Lizhuang Ma

arXiv:2506.09378·cs.CV·June 12, 2025

UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images

Qijian Tian, Xin Tan, Jingyu Gong, Yuan Xie, Lizhuang Ma

PDF

Open Access

TL;DR

UniForward is a feed-forward model that reconstructs 3D scenes and semantic fields from sparse, uncalibrated images in real time, without ground truth depth, enabling high-quality rendering and open-vocabulary segmentation.

Contribution

It introduces a novel unified approach for 3D scene and semantic field reconstruction using only sparse-view images, with a dual-branch decoder and a loss-guided view sampler.

Findings

01

Achieves state-of-the-art results in 3D scene and semantic field reconstruction.

02

Enables real-time reconstruction from sparse, uncalibrated images.

03

Supports open-vocabulary semantic segmentation in 3D.

Abstract

We propose a feed-forward Gaussian Splatting model that unifies 3D scene and semantic field reconstruction. Combining 3D scenes with semantic fields facilitates the perception and understanding of the surrounding environment. However, key challenges include embedding semantics into 3D representations, achieving generalizable real-time reconstruction, and ensuring practical applicability by using only images as input without camera parameters or ground truth depth. To this end, we propose UniForward, a feed-forward model to predict 3D Gaussians with anisotropic semantic features from only uncalibrated and unposed sparse-view images. To enable the unified representation of the 3D scene and semantic field, we embed semantic features into 3D Gaussians and predict them through a dual-branch decoupled decoder. During training, we propose a loss-guided view sampler to sample views from easy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques