TL;DR
FLEG introduces a novel feed-forward approach for reconstructing language-embedded 3D Gaussians from arbitrary views, reducing storage costs and improving semantic fidelity without fixed input view constraints.
Contribution
The paper proposes a geometry-semantic dual-branch framework and a sparse language embedding strategy, enabling flexible multi-view input and efficient semantic representation.
Findings
FLEG outperforms existing methods in reconstruction quality.
Uses only 5% of the language embeddings compared to dense schemes.
Maintains semantic fidelity with significantly reduced storage costs.
Abstract
We present FLEG, a feed-forward network that reconstructs language-embedded 3D Gaussians from arbitrary views. Previous feed-forward language-embedded Gaussian reconstruction methods are restricted to a fixed number of input views and typically attach a language-aligned semantic embedding to each Gaussian, resulting in impractical input settings and semantic redundancy. In contrast, we introduce a geometry-semantic dual-branch distillation framework that enables flexible input from arbitrary multi-view images without camera parameters. We also propose a novel-view-based distillation strategy during training that mitigates overfitting to input views. In addition, we observe that semantic representations are significantly sparser than geometric ones, and per-Gaussian language embedding is unnecessary. To exploit this sparsity, we design a decoupled language embedding strategy that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
