From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation
Mahmoud Chick Zaouali, Todd Charter, Homayoun Najjaran

TL;DR
This paper introduces a UAV-based pipeline that combines neural rendering, Gaussian Splatting, and language-guided segmentation to produce semantically meaningful 3D reconstructions for aerial inspection tasks.
Contribution
It extends Feature-3DGS with language-guided segmentation using CLIP and SAM, enabling semantic understanding in large-scale outdoor 3D reconstructions.
Findings
Effective language-driven segmentation in outdoor environments
Comparison of feature backbones (CLIP-LSeg, SAM, SAM2) for scene understanding
Hybrid approach enhances semantic interpretability of photorealistic 3D models
Abstract
High-fidelity 3D reconstruction is critical for aerial inspection tasks such as infrastructure monitoring, structural assessment, and environmental surveying. While traditional photogrammetry techniques enable geometric modeling, they lack semantic interpretability, limiting their effectiveness for automated inspection workflows. Recent advances in neural rendering and 3D Gaussian Splatting (3DGS) offer efficient, photorealistic reconstructions but similarly lack scene-level understanding. In this work, we present a UAV-based pipeline that extends Feature-3DGS for language-guided 3D segmentation. We leverage LSeg-based feature fields with CLIP embeddings to generate heatmaps in response to language prompts. These are thresholded to produce rough segmentations, and the highest-scoring point is then used as a prompt to SAM or SAM2 for refined 2D segmentation on novel view renderings.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Language-Image Pre-training · Segment Anything Model
