Object Tracking by Detection with Visual and Motion Cues
Niels Ole Salscheider

TL;DR
This paper introduces a novel online object tracking method combining visual appearance, motion cues, and a CNN-based metric estimation, evaluated on the BDD100K dataset with promising results.
Contribution
It proposes a new multi-frame CNN architecture for estimating tracking metrics and integrates these with a Kalman filter and assignment heuristic for improved tracking.
Findings
Multi-frame model achieves 39.1% MOTA on BDD100K
Single-frame model achieves 36.8% MOTA with low localization error
Efficient CNN architecture estimates appearance and motion metrics
Abstract
Self-driving cars and other autonomous vehicles need to detect and track objects in camera images. We present a simple online tracking algorithm that is based on a constant velocity motion model with a Kalman filter, and an assignment heuristic. The assignment heuristic relies on four metrics: An embedding vector that describes the appearance of objects and can be used to re-identify them, a displacement vector that describes the object movement between two consecutive video frames, the Mahalanobis distance between the Kalman filter states and the new detections, and a class distance. These metrics are combined with a linear SVM, and then the assignment problem is solved by the Hungarian algorithm. We also propose an efficient CNN architecture that estimates these metrics. Our multi-frame model accepts two consecutive video frames which are processed individually in the backbone, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Vision and Imaging
MethodsSupport Vector Machine
