IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking
Run Luo, Zikai Song, Longze Chen, Yunshui Li, Min Yang, Wei Yang

TL;DR
IP-MOT introduces an innovative transformer-based approach that uses instance-level pseudo textual descriptions and a query-balanced strategy to enhance cross-domain multi-object tracking performance.
Contribution
The paper presents a novel end-to-end transformer model for MOT that leverages pseudo textual descriptions and a query-balanced strategy to improve cross-domain generalization.
Findings
Achieves competitive same-domain MOT performance.
Significantly improves cross-domain tracking accuracy.
Demonstrates effectiveness on multiple benchmark datasets.
Abstract
Multi-Object Tracking (MOT) aims to associate multiple objects across video frames and is a challenging vision task due to inherent complexities in the tracking environment. Most existing approaches train and track within a single domain, resulting in a lack of cross-domain generalizability to data from other domains. While several works have introduced natural language representation to bridge the domain gap in visual tracking, these textual descriptions often provide too high-level a view and fail to distinguish various instances within the same class. In this paper, we address this limitation by developing IP-MOT, an end-to-end transformer model for MOT that operates without concrete textual descriptions. Our approach is underpinned by two key innovations: Firstly, leveraging a pre-trained vision-language model, we obtain instance-level pseudo textual descriptions via prompt-tuning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Data Stream Mining Techniques · Fire Detection and Safety Systems
