# A Hybrid YOLO and Segment Anything Model Pipeline for Multi-Damage Segmentation in UAV Inspection Imagery

**Authors:** Rafael Cabral, Ricardo Santos, José A. F. O. Correia, Diogo Ribeiro

PMC · DOI: 10.3390/s25216568 · 2025-10-25

## TL;DR

This paper proposes a hybrid model combining YOLO and SAM to improve multi-damage segmentation in UAV images for infrastructure inspection.

## Contribution

A class-specific hybrid pipeline that leverages YOLO and SAM for accurate multi-damage segmentation in UAV imagery.

## Key findings

- The hybrid pipeline achieved a mean Average Precision (mAP50) of 0.593.
- Class-specific Intersection over Union (IoU) scores were 0.495 for cracks, 0.331 for efflorescence, and 0.205 for exposed rebar.
- The results highlight the effectiveness of combining specialized detectors with foundation models for infrastructure inspection.

## Abstract

The automated inspection of civil infrastructure with Unmanned Aerial Vehicles (UAVs) is hampered by the challenge of accurately segmenting multi-damage in high-resolution imagery. While foundational models like the Segment Anything Model (SAM) offer data-efficient segmentation, their effectiveness is constrained by prompting strategies, especially for geometrically complex defects. This paper presents a comprehensive comparative analysis of deep learning strategies to identify an optimal deep learning pipeline for segmenting cracks, efflorescences, and exposed rebars. It systematically evaluates three distinct end-to-end segmentation frameworks: the native output of a YOLO11 model; the Segment Anything Model (SAM), prompted by bounding boxes; and SAM, guided by a point-prompting mechanism derived from the detector’s probability map. Based on these findings, a final, optimized hybrid pipeline is proposed: for linear cracks, the native segmentation output of the SAHI-trained YOLO model is used, while for efflorescence and exposed rebar, the model’s bounding boxes are used to prompt SAM for a refined segmentation. This class-specific strategy yielded a final mean Average Precision (mAP50) of 0.593, with class-specific Intersection over Union (IoU) scores of 0.495 (cracks), 0.331 (efflorescence), and 0.205 (exposed rebar). The results establish that the future of automated inspection lies in intelligent frameworks that leverage the respective strengths of specialized detectors and powerful foundation models in a context-aware manner.

## Full-text entities

- **Genes:** EMG1 (EMG1 N1-specific pseudouridine methyltransferase) [NCBI Gene 10436] {aka C2F, Grcc2f, NEP1}
- **Diseases:** SAM (MESH:C537538), TP (MESH:C579935), injury to (MESH:D014947), cracks (MESH:D003387), damage (MESH:D020263), C2PSA block (MESH:D006327)
- **Chemicals:** SAM (-), steel (MESH:D013232)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12609025/full.md

---
Source: https://tomesphere.com/paper/PMC12609025