TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with   Motion-Appearance Cost (MAC) SORT

Duy Le Dinh Anh; Kim Hoang Tran; Ngan Hoang Le

arXiv:2409.02490·cs.CV·September 5, 2024

TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT

Duy Le Dinh Anh, Kim Hoang Tran, Ngan Hoang Le

PDF

Open Access 1 Repo

TL;DR

This paper introduces TP-GMOT, a zero-shot, text prompt-based framework for generic multiple object tracking that can handle unseen categories and complex scenarios by leveraging a new dataset and novel detection and association components.

Contribution

The paper presents a new dataset, TP-GMOT, and a novel zero-shot tracking framework with text prompt-based detection and motion-appearance integrated association, advancing GMOT capabilities.

Findings

01

Effective in tracking unseen object categories.

02

Outperforms existing methods on GMOT benchmarks.

03

Demonstrates strong generalization on multiple datasets.

Abstract

While Multi-Object Tracking (MOT) has made substantial advancements, it is limited by heavy reliance on prior knowledge and limited to predefined categories. In contrast, Generic Multiple Object Tracking (GMOT), tracking multiple objects with similar appearance, requires less prior information about the targets but faces challenges with variants like viewpoint, lighting, occlusion, and resolution. Our contributions commence with the introduction of the \textbf{\text{Refer-GMOT dataset}} a collection of videos, each accompanied by fine-grained textual descriptions of their attributes. Subsequently, we introduce a novel text prompt-based open-vocabulary GMOT framework, called \textbf{\text{TP-GMOT}}, which can track never-seen object categories with zero training examples. Within \text{TP-GMOT} framework, we introduce two novel components: (i) {\textbf{\text{TP-OD}}, an object detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Fsoft-AIC/TP-GMOT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques