T-3DGS: Removing Transient Objects for 3D Scene Reconstruction
Alexander Markin, Vadim Pryadilshchikov, Artem Komarichev, Ruslan, Rakhimov, Peter Wonka, Evgeny Burnaev

TL;DR
T-3DGS introduces a robust framework that effectively filters out transient objects in video sequences, significantly improving the quality of 3D scene reconstructions using Gaussian Splatting.
Contribution
The paper presents a novel two-step method combining unsupervised classification and segmentation with tracking to remove transient distractors during 3D reconstruction.
Findings
Outperforms state-of-the-art methods on various datasets.
Enables high-fidelity 3D reconstructions in challenging scenarios.
Effectively distinguishes transient objects from static scene elements.
Abstract
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions. To address this challenge, we propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting. Our framework consists of two steps. First, we employ an unsupervised classification network that distinguishes transient objects from static scene elements by leveraging their distinct training dynamics within the reconstruction process. Second, we refine these initial detections by integrating an off-the-shelf segmentation method with a bidirectional tracking module, which together enhance boundary accuracy and temporal coherence. Evaluations on both sparsely and densely captured video datasets demonstrate that T-3DGS significantly outperforms state-of-the-art approaches, enabling high-fidelity 3D reconstructions in…
Peer Reviews
Decision·Submitted to ICLR 2026
- originality-wise: the idea of utilizing uncertainty modeling and mask propagation to handle dynamic objects is interesting. - quality-wise: qualitative and quantitative results demonstrate the effectiveness of the proposed approach. - clarity-wise: the paper is well-written in general. - significance-wise: the problem of removing transient objects is important for downstream tasks of 3D reconstruction from in-the-wild videos.
1. For temporal refinement (L315): can we just use forward or backward propagation? How bad will the performance be qualitatively and quantitatively? 2. Can authors provide some runtime analysis? 3. How to determine the extent of dilation (L290) as it seems important from Tab. 3? 4. From the Fig. 2, it seems like RUP is not updated, which contradicts L170. Can authors clarify? 5. For Fig. 9, the T-3DGS's results do not seem to be from the same camera as the other methods or GT. Is this a bug
1. The authors study an interesting problem of transient object, which is important for real-world scene reconstruction. 2. The method is well-designed and theoretically grounded. 3. The pipeline utilize DINOv2 features to provide robustness against color similarity and high-frequency textures. 4. The experiments compare against several baseline approaches and ablate key modules. The authors also introduce a new dataset with transient objects. 5. The authors provide qualitative results to clearl
1. The training pipeline is too heavy, which uses DINOv2 to extract features, use SAM for spatial refinement and SAM2 for temporal refinement. 2. Each submodule is adapted from existing techniques. 2.1 RUP uses DINOv2 features for semantic understanding, follows WildGaussians to build per-pixel residual with FeatUP and DSSIM, follows NeRF-W use uncertainty in 3DGS and separate static and transient objects. 2.2 The first part of the TMR uses SAM to clean up the noisy binary masks predicted by RUP
1. The paper is well motivated, addressing a practical problem in 3DGS-based scene editing. 2. The proposed T-3DGS dataset fills a gap for semi-transient object evaluation.
1. Limited novelty: The main contribution seems to be a combination of existing techniques (uncertainty estimation, semantic guidance, and SAM-based propagation), rather than addressing the deeper underlying issue of poor extrapolation and generalization in 3DGS. 2. Insufficient analysis: The paper lacks detailed sensitivity studies for key thresholds and hyperparameters, and the divergence-based uncertainty formulation remains largely heuristic without strong theoretical or empirical justificat
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Computer Graphics and Visualization Techniques · Robotics and Sensor-Based Localization
