YOLO -- You only look 10647 times
Christian Limberg, Andrew Melnik, Augustin Harter, Helge Ritter

TL;DR
This paper explains the YOLO object detection method as a parallel classification of over ten thousand fixed region proposals, linking it conceptually to other detection and classification models, and provides visualization tools for better understanding.
Contribution
It offers a new perspective on YOLO as a parallel classification process and introduces visualization tools to explore YOLO's information processing streams.
Findings
YOLO's output pixels attend to specific sub-regions, acting like region proposals.
This view unifies YOLO with RCNN and ResNet models conceptually.
Visualization tools enhance understanding of YOLO's internal processing.
Abstract
With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals. We support this view by showing that each of YOLOs output pixel is attentive to a specific sub-region of previous layers, comparable to a local region proposal. This understanding reduces the conceptual gap between YOLO-like single-stage object detection models, RCNN-like two-stage region proposal based models, and ResNet-like image classification models. In addition, we created interactive exploration tools for a better visual understanding of the YOLO information processing streams: https://limchr.github.io/yolo_visualization
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHermeneutics and Narrative Identity · Aging, Elder Care, and Social Issues · Health, Medicine and Society
MethodsYou Only Look Once
