Edge-Optimized Vision-Language Models for Underground Infrastructure Assessment
Johny J. Lopez, Md Meftahul Ferdaus, Mahdi Abdelguerfi

TL;DR
This paper introduces an efficient, edge-deployable AI pipeline combining defect segmentation and natural language summarization for underground infrastructure inspection, enabling real-time autonomous assessment on resource-constrained devices.
Contribution
The paper presents a novel two-stage pipeline with a lightweight segmentation model and a fine-tuned vision-language model optimized for edge deployment in underground infrastructure inspection.
Findings
Achieved 0.834 F1-score with 0.64M parameters in defect segmentation.
Enabled real-time summarization with hardware-specific model optimization.
Demonstrated successful deployment on a mobile robotic platform.
Abstract
Autonomous inspection of underground infrastructure, such as sewer and culvert systems, is critical to public safety and urban sustainability. Although robotic platforms equipped with visual sensors can efficiently detect structural deficiencies, the automated generation of human-readable summaries from these detections remains a significant challenge, especially on resource-constrained edge devices. This paper presents a novel two-stage pipeline for end-to-end summarization of underground deficiencies, combining our lightweight RAPID-SCAN segmentation model with a fine-tuned Vision-Language Model (VLM) deployed on an edge computing platform. The first stage employs RAPID-SCAN (Resource-Aware Pipeline Inspection and Defect Segmentation using Compact Adaptive Network), achieving 0.834 F1-score with only 0.64M parameters for efficient defect segmentation. The second stage utilizes a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrastructure Maintenance and Monitoring · Advanced Neural Network Applications · Multimodal Machine Learning Applications
