Combining YOLO and Visual Rhythm for Vehicle Counting
Victor Nascimento Ribeiro, Nina S. T. Hirata

TL;DR
This paper introduces an efficient vehicle counting method combining YOLO detection with Visual Rhythm to focus on key frames, achieving high accuracy and faster processing without tracking.
Contribution
It proposes a novel vehicle counting approach that eliminates tracking by using Visual Rhythm to select key frames, enhancing efficiency and accuracy.
Findings
Achieves 99.15% mean counting accuracy.
Processes videos three times faster than tracking methods.
Effective for unidirectional moving target detection.
Abstract
Video-based vehicle detection and counting play a critical role in managing transport infrastructure. Traditional image-based counting methods usually involve two main steps: initial detection and subsequent tracking, which are applied to all video frames, leading to a significant increase in computational complexity. To address this issue, this work presents an alternative and more efficient method for vehicle detection and counting. The proposed approach eliminates the need for a tracking step and focuses solely on detecting vehicles in key video frames, thereby increasing its efficiency. To achieve this, we developed a system that combines YOLO, for vehicle detection, with Visual Rhythm, a way to create time-spatial images that allows us to focus on frames that contain useful information. Additionally, this method can be used for counting in any application involving unidirectional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety · Vehicle License Plate Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sparse Evolutionary Training · Focus
