UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with Geometric Topology Guidance
Son Tran, Cong Tran, Anh Tran, Cuong Pham

TL;DR
This paper introduces UnsMOT, an unsupervised multi-object tracking framework that combines appearance, motion, and geometric information to improve tracking accuracy without requiring annotated data.
Contribution
The novel UnsMOT framework integrates appearance, motion, and geometric features using CNN, RNN, and GNN models for unsupervised multi-object tracking.
Findings
Achieves state-of-the-art performance on MOT benchmarks.
Effectively combines appearance, motion, and geometric cues.
Improves tracking accuracy without labeled training data.
Abstract
Object detection has long been a topic of high interest in computer vision literature. Motivated by the fact that annotating data for the multi-object tracking (MOT) problem is immensely expensive, recent studies have turned their attention to the unsupervised learning setting. In this paper, we push forward the state-of-the-art performance of unsupervised MOT methods by proposing UnsMOT, a novel framework that explicitly combines the appearance and motion features of objects with geometric information to provide more accurate tracking. Specifically, we first extract the appearance and motion features using CNN and RNN models, respectively. Then, we construct a graph of objects based on their relative distances in a frame, which is fed into a GNN model together with CNN features to output geometric embedding of objects optimized using an unsupervised loss function. Finally, associations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
