From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation

Mahmoud Chick Zaouali; Todd Charter; Homayoun Najjaran

arXiv:2505.17402·cs.GR·May 26, 2025

From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation

Mahmoud Chick Zaouali, Todd Charter, Homayoun Najjaran

PDF

TL;DR

This paper introduces a UAV-based pipeline that combines neural rendering, Gaussian Splatting, and language-guided segmentation to produce semantically meaningful 3D reconstructions for aerial inspection tasks.

Contribution

It extends Feature-3DGS with language-guided segmentation using CLIP and SAM, enabling semantic understanding in large-scale outdoor 3D reconstructions.

Findings

01

Effective language-driven segmentation in outdoor environments

02

Comparison of feature backbones (CLIP-LSeg, SAM, SAM2) for scene understanding

03

Hybrid approach enhances semantic interpretability of photorealistic 3D models

Abstract

High-fidelity 3D reconstruction is critical for aerial inspection tasks such as infrastructure monitoring, structural assessment, and environmental surveying. While traditional photogrammetry techniques enable geometric modeling, they lack semantic interpretability, limiting their effectiveness for automated inspection workflows. Recent advances in neural rendering and 3D Gaussian Splatting (3DGS) offer efficient, photorealistic reconstructions but similarly lack scene-level understanding. In this work, we present a UAV-based pipeline that extends Feature-3DGS for language-guided 3D segmentation. We leverage LSeg-based feature fields with CLIP embeddings to generate heatmaps in response to language prompts. These are thresholded to produce rough segmentations, and the highest-scoring point is then used as a prompt to SAM or SAM2 for refined 2D segmentation on novel view renderings.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training · Segment Anything Model