TL;DR
This paper presents a novel learning-based post-processing pipeline for video object detection that enhances accuracy, especially for fast-moving objects, while maintaining low computational resource requirements.
Contribution
It introduces a learning-based similarity evaluation method that improves existing video detectors and adapts to efficient still image detectors like YOLO.
Findings
Improves detection accuracy for fast-moving objects.
Achieves comparable results to more complex detectors with fewer resources.
Enhances existing video detection pipelines with a novel post-processing approach.
Abstract
Object recognition in video is an important task for plenty of applications, including autonomous driving perception, surveillance tasks, wearable devices or IoT networks. Object recognition using video data is more challenging than using still images due to blur, occlusions or rare object poses. Specific video detectors with high computational cost or standard image detectors together with a fast post-processing algorithm achieve the current state-of-the-art. This work introduces a novel post-processing pipeline that overcomes some of the limitations of previous post-processing methods by introducing a learning-based similarity evaluation between detections across frames. Our method improves the results of state-of-the-art specific video detectors, specially regarding fast moving objects, and presents low resource requirements. And applied to efficient still image detectors, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsYou Only Look Once
