Deep Affinity Network for Multiple Object Tracking
ShiJie Sun, Naveed Akhtar, HuanSheng Song, Ajmal Mian, Mubarak Shah

TL;DR
This paper introduces a Deep Affinity Network that leverages deep learning to improve data association in multiple object tracking by jointly modeling object appearances and affinities, resulting in state-of-the-art performance.
Contribution
The paper presents an end-to-end deep learning framework that jointly models object appearance and affinity for improved data association in MOT, outperforming existing methods.
Findings
Achieves top performance on MOT15, MOT17, and UA-DETRAC benchmarks.
Effectively models object appearance and affinity jointly.
Handles object appearance/disappearance dynamically.
Abstract
Multiple Object Tracking (MOT) plays an important role in solving many fundamental problems in video analysis in computer vision. Most MOT methods employ two steps: Object Detection and Data Association. The first step detects objects of interest in every frame of a video, and the second establishes correspondence between the detected objects in different frames to obtain their tracks. Object detection has made tremendous progress in the last few years due to deep learning. However, data association for tracking still relies on hand crafted constraints such as appearance, motion, spatial proximity, grouping etc. to compute affinities between the objects in different frames. In this paper, we harness the power of deep learning for data association in tracking by jointly modelling object appearances and their affinities between different frames in an end-to-end fashion. The proposed Deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Visual Attention and Saliency Detection
