LaMOT: Language-Guided Multi-Object Tracking

Yunhao Li; Xiaoqiong Liu; Luke Liu; Heng Fan; Libo Zhang

arXiv:2406.08324·cs.CV·June 13, 2024·1 cites

LaMOT: Language-Guided Multi-Object Tracking

Yunhao Li, Xiaoqiong Liu, Luke Liu, Heng Fan, Libo Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LaMOT, a new benchmark and framework for vision-language multi-object tracking, enabling tracking based on natural language commands and providing a standardized platform for evaluation.

Contribution

It presents the first large-scale benchmark, LaMOT, and a simple tracker, LaMOTer, to advance research in language-guided multi-object tracking.

Findings

01

LaMOT benchmark includes 1,660 sequences from 4 datasets.

02

Provides a unified evaluation platform for Vision-Language MOT.

03

Introduces LaMOTer, an effective baseline tracker.

Abstract

Vision-Language MOT is a crucial tracking problem and has drawn increasing attention recently. It aims to track objects based on human language commands, replacing the traditional use of templates or pre-set information from training sets in conventional tracking tasks. Despite various efforts, a key challenge lies in the lack of a clear understanding of why language is used for tracking, which hinders further development in this field. In this paper, we address this challenge by introducing Language-Guided MOT, a unified task framework, along with a corresponding large-scale benchmark, termed LaMOT, which encompasses diverse scenarios and language descriptions. Specially, LaMOT comprises 1,660 sequences from 4 different datasets and aims to unify various Vision-Language MOT tasks while providing a standardized evaluation platform. To ensure high-quality annotations, we manually assign…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nathan-li123/lamot
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods