Multi-view Crowd Tracking Transformer with View-Ground Interactions Under Large Real-World Scenes

Qi Zhang; Jixuan Chen; Kaiyi Zhang; Xinquan Yu; Antoni B. Chan; Hui Huang

arXiv:2604.19318·cs.CV·April 22, 2026

Multi-view Crowd Tracking Transformer with View-Ground Interactions Under Large Real-World Scenes

Qi Zhang, Jixuan Chen, Kaiyi Zhang, Xinquan Yu, Antoni B. Chan, Hui Huang

PDF

1 Repo

TL;DR

This paper introduces a Transformer-based multi-view crowd tracking model, MVTrackTrans, and provides large real-world datasets, demonstrating improved performance in complex scenes over existing methods.

Contribution

The paper presents a novel Transformer-based model for multi-view crowd tracking and introduces two large real-world datasets for better evaluation.

Findings

01

MVTrackTrans outperforms existing methods on new large datasets.

02

The datasets contain larger scenes and longer sequences than previous benchmarks.

03

View-ground interactions enhance multi-view tracking accuracy.

Abstract

Multi-view crowd tracking estimates each person's tracking trajectories on the ground of the scene. Recent research works mainly rely on CNNs-based multi-view crowd tracking architectures, and most of them are evaluated and compared on relatively small datasets, such as Wildtrack and MultiviewX. Since these two datasets are collected in small scenes and only contain tens of frames in the evaluation stage, it is difficult for the current methods to be applied to real-world applications where scene size and occlusion are more complicated. In this paper, we propose a Transformer-based multi-view crowd tracking model, \textit{MVTrackTrans}, which adopts interactions between camera views and the ground plane for enhanced multi-view tracking performance. Besides, for better evaluation, we collect and label two large real-world multi-view tracking datasets, MVCrowdTrack and CityTrack, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zqyq/MVTrackTrans
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.