MV-TAP: Tracking Any Point in Multi-View Videos

Jahyeok Koo; In\`es Hyeonsu Kim; Mungyeom Kim; Junghyun Park; Seohyun Park; Jaeyeong Kim; Jung Yi; Seokju Cho; Seungryong Kim

arXiv:2512.02006·cs.CV·December 2, 2025

MV-TAP: Tracking Any Point in Multi-View Videos

Jahyeok Koo, In\`es Hyeonsu Kim, Mungyeom Kim, Junghyun Park, Seohyun Park, Jaeyeong Kim, Jung Yi, Seokju Cho, Seungryong Kim

PDF

Open Access

TL;DR

MV-TAP is a new multi-view point tracking method that uses camera geometry and cross-view attention to improve trajectory estimation in complex scenes, supported by synthetic and real-world datasets.

Contribution

Introduces MV-TAP, a novel multi-view point tracker leveraging cross-view attention and geometry, with a large synthetic dataset for training and evaluation.

Findings

01

MV-TAP outperforms existing methods on benchmarks.

02

The approach improves trajectory completeness and reliability.

03

Extensive experiments validate its effectiveness.

Abstract

Multi-view camera systems enable rich observations of complex real-world scenes, and understanding dynamic objects in multi-view settings has become central to various applications. In this work, we present MV-TAP, a novel point tracker that tracks points across multi-view videos of dynamic scenes by leveraging cross-view information. MV-TAP utilizes camera geometry and a cross-view attention mechanism to aggregate spatio-temporal information across views, enabling more complete and reliable trajectory estimation in multi-view videos. To support this task, we construct a large-scale synthetic training dataset and real-world evaluation sets tailored for multi-view tracking. Extensive experiments demonstrate that MV-TAP outperforms existing point-tracking methods on challenging benchmarks, establishing an effective baseline for advancing research in multi-view point tracking.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging