IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking

Run Luo; Zikai Song; Longze Chen; Yunshui Li; Min Yang; Wei Yang

arXiv:2410.23907·cs.CV·November 1, 2024

IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking

Run Luo, Zikai Song, Longze Chen, Yunshui Li, Min Yang, Wei Yang

PDF

Open Access

TL;DR

IP-MOT introduces an innovative transformer-based approach that uses instance-level pseudo textual descriptions and a query-balanced strategy to enhance cross-domain multi-object tracking performance.

Contribution

The paper presents a novel end-to-end transformer model for MOT that leverages pseudo textual descriptions and a query-balanced strategy to improve cross-domain generalization.

Findings

01

Achieves competitive same-domain MOT performance.

02

Significantly improves cross-domain tracking accuracy.

03

Demonstrates effectiveness on multiple benchmark datasets.

Abstract

Multi-Object Tracking (MOT) aims to associate multiple objects across video frames and is a challenging vision task due to inherent complexities in the tracking environment. Most existing approaches train and track within a single domain, resulting in a lack of cross-domain generalizability to data from other domains. While several works have introduced natural language representation to bridge the domain gap in visual tracking, these textual descriptions often provide too high-level a view and fail to distinguish various instances within the same class. In this paper, we address this limitation by developing IP-MOT, an end-to-end transformer model for MOT that operates without concrete textual descriptions. Our approach is underpinned by two key innovations: Firstly, leveraging a pre-trained vision-language model, we obtain instance-level pseudo textual descriptions via prompt-tuning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Data Stream Mining Techniques · Fire Detection and Safety Systems