Multiple Object Tracking from appearance by hierarchically clustering tracklets
Andreu Girbau, Ferran Marqu\'es, Shin'ichi Satoh

TL;DR
This paper presents a novel multiple object tracking method that primarily uses appearance features and hierarchical clustering of tracklets, achieving state-of-the-art results on DanceTrack and competitive performance on MOT17 and MOT20.
Contribution
It introduces a hierarchical clustering approach for tracklet fusion based on appearance similarity, emphasizing appearance as the main association cue in MOT.
Findings
Effective on three MOT benchmarks: MOT17, MOT20, and DanceTrack.
Achieves state-of-the-art results on DanceTrack.
Competitive performance on MOT17 and MOT20.
Abstract
Current approaches in Multiple Object Tracking (MOT) rely on the spatio-temporal coherence between detections combined with object appearance to match objects from consecutive frames. In this work, we explore MOT using object appearances as the main source of association between objects in a video, using spatial and temporal priors as weighting factors. We form initial tracklets by leveraging on the idea that instances of an object that are close in time should be similar in appearance, and build the final object tracks by fusing the tracklets in a hierarchical fashion. We conduct extensive experiments that show the effectiveness of our method over three different MOT benchmarks, MOT17, MOT20, and DanceTrack, being competitive in MOT17 and MOT20 and establishing state-of-the-art results in DanceTrack.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Human Pose and Action Recognition
