PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging

Tianyi Qu; Songxiao Yang; Haolin Wang; Huadong Song; Xiaoting Guo; Wenguang Hu; Guanlin Liu; Honghe Chen; Yafei Ou

arXiv:2602.07044·cs.CV·April 23, 2026

PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging

Tianyi Qu, Songxiao Yang, Haolin Wang, Huadong Song, Xiaoting Guo, Wenguang Hu, Guanlin Liu, Honghe Chen, Yafei Ou

PDF

1 Datasets

TL;DR

PipeMFL-240K is a large, annotated dataset and benchmark for object detection in pipeline Magnetic Flux Leakage images, addressing real-world challenges and enabling progress in automated pipeline inspection.

Contribution

It introduces the first large-scale public dataset and benchmark for pipeline MFL object detection, facilitating fair comparison and reproducible research.

Findings

01

Modern detectors struggle with MFL data complexity.

02

The dataset contains 249,320 images with 200,020 annotations.

03

Results highlight significant room for improvement in detection algorithms.

Abstract

Pipeline integrity is critical to industrial safety and environmental protection, with Magnetic Flux Leakage (MFL) detection being a primary non-destructive testing technology. Despite the promise of deep learning for automating MFL interpretation, progress toward reliable models has been constrained by the absence of a large-scale public dataset and benchmark, making fair comparison and reproducible evaluation difficult. We introduce \textbf{PipeMFL-240K}, a large-scale, meticulously annotated dataset and benchmark for complex object detection in pipeline MFL pseudo-color images. PipeMFL-240K reflects real-world inspection complexity and poses several unique challenges: (i) an extremely long-tailed distribution over \textbf{12} categories, (ii) a high prevalence of tiny objects that often comprise only a handful of pixels and (iii) substantial intra-class variability. The dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

PipeMFL/PipeMFL-240K
dataset· 826 dl
826 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.