YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision
Muhammad Hussain

TL;DR
This paper reviews the evolution of YOLO object detectors, highlighting architectural innovations and performance improvements from YOLOv5 to YOLOv10, emphasizing their suitability for real-time and edge applications.
Contribution
It provides a comprehensive analysis of architectural advancements and performance gains across YOLO versions, guiding optimal selection for specific deployment needs.
Findings
YOLOv5 introduced CSPDarknet backbone and Mosaic Augmentation.
YOLOv8 enhanced feature extraction with anchor-free detection.
YOLOv10 achieved state-of-the-art performance with reduced computation.
Abstract
This paper presents a comprehensive review of the evolution of the YOLO (You Only Look Once) object detection algorithm, focusing on YOLOv5, YOLOv8, and YOLOv10. We analyze the architectural advancements, performance improvements, and suitability for edge deployment across these versions. YOLOv5 introduced significant innovations such as the CSPDarknet backbone and Mosaic Augmentation, balancing speed and accuracy. YOLOv8 built upon this foundation with enhanced feature extraction and anchor-free detection, improving versatility and performance. YOLOv10 represents a leap forward with NMS-free training, spatial-channel decoupled downsampling, and large-kernel convolutions, achieving state-of-the-art performance with reduced computational overhead. Our findings highlight the progressive enhancements in accuracy, efficiency, and real-time performance, particularly emphasizing their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Advanced Semiconductor Detectors and Materials
MethodsYou Only Look Once · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
