# Vision–Language Models for Transmission Line Fault Detection: A New Approach for Grid Reliability and Optimization

**Authors:** Runle Yu, Lihao Mai, Yang Weng, Qiushi Cui, Guochang Xu, Pengliang Ren

PMC · DOI: 10.3390/jimaging12030106 · Journal of Imaging · 2026-02-28

## TL;DR

This paper introduces a vision-language model approach for detecting transmission line faults using specialized components to improve accuracy and reliability.

## Contribution

The novel contribution is the integration of domain-specific components with a pre-trained vision-language model for transmission line fault detection without end-to-end retraining.

## Key findings

- The proposed methods improve accuracy for detecting thin and low-contrast faults in transmission corridors.
- False alarms outside the right-of-way are significantly reduced using a corridor geo prior.
- The system maintains suitable performance for deployment on edge devices like UAVs.

## Abstract

Reliable fault detection along transmission corridors is essential for preventing small defects from developing into long outages and costly emergency operations. This study aims to improve the field reliability of an open vocabulary vision language backbone without retraining the large model in an end-to-end manner. The work focuses on four operational fault classes in multi-region corridor imagery collected during routine inspections and uses a Florence-2 vision language model as the base recognizer. On top of this backbone, three domain-specific components are introduced. A subclass-aware fusion scheme keeps probability mass within the active parent concept so that insulator icing and conductor icing produce stable, action-oriented decisions. A Power-Line Focus Then Crop normalization uses an attention-guided corridor window together with isotropic resizing so that thin conductors and small fittings remain visible in the processed image. A corridor geo prior reduces scores as the distance from the mapped centerline increases and in this way suppresses detections that lie outside the corridor. All methods are evaluated under a shared preprocessing and scoring pipeline in training-free and parameter-efficient tuning modes. Experiments on unseen regions show higher accuracy for skinny and low-contrast faults, fewer false alarms outside the right-of-way, and improved score calibration in the confidence range used for triage, while keeping throughput and memory usage suitable for unmanned aerial vehicles and substation edge devices.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), pain (MESH:D010146), overweight (MESH:D050177)
- **Chemicals:** FTC (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13028582/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13028582/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC13028582/full.md

---
Source: https://tomesphere.com/paper/PMC13028582