Blending of Learning-based Tracking and Object Detection for Monocular Camera-based Target Following
Pranoy Panda, Martin Barczyk

TL;DR
This paper introduces a real-time, lightweight method that combines learning-based tracking, object detection, and re-identification to improve target recovery during occlusions and motion blur in monocular camera-based tracking.
Contribution
It presents a novel fusion approach that enhances the robustness of CNN-based trackers for familiar objects, achieving high frame rates and competitive benchmark performance.
Findings
Achieves 85-90 FPS in real-time tracking.
Improves recovery after occlusions and motion blur.
Attains competitive results on benchmark datasets.
Abstract
Deep learning has recently started being applied to visual tracking of generic objects in video streams. For the purposes of robotics applications, it is very important for a target tracker to recover its track if it is lost due to heavy or prolonged occlusions or motion blur of the target. We present a real-time approach which fuses a generic target tracker and object detection module with a target re-identification module. Our work focuses on improving the performance of Convolutional Recurrent Neural Network-based object trackers in cases where the object of interest belongs to the category of \emph{familiar} objects. Our proposed approach is sufficiently lightweight to track objects at 85-90 FPS while attaining competitive results on challenging benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
