Awesome Multi-modal Object Tracking

Chunhui Zhang; Li Liu; Hao Wen; Xi Zhou; Yanfeng Wang

arXiv:2405.14200·cs.CV·June 3, 2024·1 cites

Awesome Multi-modal Object Tracking

Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

PDF

Open Access 5 Repos

TL;DR

This paper provides a comprehensive review of multi-modal object tracking, categorizing existing tasks, analyzing datasets and algorithms, and highlighting recent advances and challenges in integrating multiple data modalities.

Contribution

It offers a systematic categorization and analysis of MMOT tasks, datasets, and algorithms, and summarizes recent progress and future directions in the field.

Findings

01

Existing MMOT mainly focus on two modalities.

02

Recent efforts aim for unified models for any modality.

03

Large-scale multi-modal benchmarks have been established.

Abstract

Multi-modal object tracking (MMOT) is an emerging field that combines data from various modalities, \eg vision (RGB), depth, thermal infrared, event, language and audio, to estimate the state of an arbitrary object in a video sequence. It is of great significance for many applications such as autonomous driving and intelligent surveillance. In recent years, MMOT has received more and more attention. However, existing MMOT algorithms mainly focus on two modalities (\eg RGB+depth, RGB+thermal infrared, and RGB+language). To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality. Additionally, some large-scale multi-modal tracking benchmarks have been established by simultaneously providing more than two modalities, such as vision-language-audio (\eg WebUAV-3M) and vision-depth-language (\eg UniMod1K). To track the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Infrared Target Detection Methodologies

MethodsFocus