Distractor-free Generalizable 3D Gaussian Splatting
Yanqi Bao, Jing Liao, Jing Huo, Yang Gao

TL;DR
DGGS introduces a novel framework for 3D Gaussian Splatting that effectively handles distractor data, improving stability, generalization, and inference quality in unseen scenes through innovative mask prediction and distractor removal techniques.
Contribution
It proposes a scene-agnostic, reference-based mask prediction and a two-stage inference framework to eliminate distractor effects in 3D Gaussian Splatting, enhancing generalization and stability.
Findings
Outperforms existing methods in distractor scene reconstruction.
Achieves superior mask prediction accuracy compared to scene-specific training.
Effectively removes distractor artifacts and holes during inference.
Abstract
We present DGGS, a novel framework that addresses the previously unexplored challenge: (3DGS). It mitigates 3D inconsistency and training instability caused by distractor data in the cross-scenes generalizable train setting while enabling feedforward inference for 3DGS and distractor masks from references in the unseen scenes. To achieve these objectives, DGGS proposes a scene-agnostic reference-based mask prediction and refinement module during the training phase, effectively eliminating the impact of distractor on training stability. Moreover, we combat distractor-induced artifacts and holes at inference time through a novel two-stage inference framework for references scoring and re-selection, complemented by a distractor pruning mechanism that further removes residual distractor 3DGS-primitive influences. Extensive…
Peer Reviews
Decision·ICLR 2026 Poster
- Clearly identifies and formulates a new, practically relevant problem: distractor-free generalizable 3DGS. - Elegant reference-based mask filtering that reduces over-suppression typical of residual-only masks. - Thoughtful mask refinement: decoupling disparity vs. distractor; auxiliary loss exploiting cross-view occlusion cues. - Practical two-stage inference: reference scoring and 3D primitive pruning demonstrably reduce artifacts/holes. - Strong empirical results with comprehensive compa
(1) Dependence on pre-trained entity segmentation during training and inference undermines full “feed-forward” purity and adds latency; domain robustness of the segmenter is not analyzed. (2) The mask fusion strategy uses intersection across references (conservative), which may lead to under-coverage in low-overlap or high-parallax settings; the trade-off is not deeply quantified. (3) Distractor pruning can introduce speckle/holes in commonly occluded areas; mitigation is heuristic and the fai
- This is the first work addressing distractors in generalizable 3DGS, filling an important gap in real-world usage. - It significantly boosts robustness and reconstruction quality compared to both baseline 3DGS models and naively transferred scene-specific distractor-free methods. - The approach generalizes well to unseen scenes and improves inference quality via smart reference selection and pruning.
- The method relies on several additional modules, but the sensitivity of the overall performance to the choice or quality of these modules is not discussed. - The quality of the generated masks depends on the accuracy of segmentation and depth estimation, which may lead to failure cases in scenes with heavy occlusions or imprecise geometry. - Since the approach depends on mask generation, it is unclear how well it would handle naturally dynamic environments, such as moving trees or water, where
1. The use of multi-view geometric consistency to correct masks, reflects good insight rather than brute force. 2. Comprehensive experiments, including synthetic distractor construction. 3. Inference-time pruning is practical and effective.
1. Heavy reliance on segmentation priors. The pipeline is not truly “feed-forward generalizable” if high-quality segmentation is required and pre-computed. 2. Reference stability assumption unproven. The paper does not quantify how often reference re-rendering is accurate enough to serve as a stable supervisory source. 3. Efficiency cost. Two-stage inference + segmentation noticeably sacrifices speed, which is a key appeal of 3DGS. 4. Mask failure modes not fully analyzed. The limitations sec
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis
MethodsPruning
