Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10
Vung Pham, Lan Dong Thi Ngoc, and Duy-Linh Bui

TL;DR
This study compares various YOLO architectures, including custom and lightweight models, for efficient and accurate road damage detection, emphasizing inference speed optimization and dataset augmentation.
Contribution
It introduces a combined YOLOv7-based approach with Coordinate Attention and model reparameterization for improved detection performance and speed.
Findings
Ensemble model achieves F1 score of 0.7027
Inference speed of 0.0547 seconds per image
Incorporation of external pothole dataset enhances detection
Abstract
Maintaining roadway infrastructure is essential for ensuring a safe, efficient, and sustainable transportation system. However, manual data collection for detecting road damage is time-consuming, labor-intensive, and poses safety risks. Recent advancements in artificial intelligence, particularly deep learning, offer a promising solution for automating this process using road images. This paper presents a comprehensive workflow for road damage detection using deep learning models, focusing on optimizations for inference speed while preserving detection accuracy. Specifically, to accommodate hardware limitations, large images are cropped, and lightweight models are utilized. Additionally, an external pothole dataset is incorporated to enhance the detection of this underrepresented damage class. The proposed approach employs multiple model architectures, including a custom YOLOv7 model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrastructure Maintenance and Monitoring · Advanced Neural Network Applications · Industrial Vision Systems and Defect Detection
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Coordinate attention
