# Multi-View 3D Point Tracking

**Authors:** Frano Raji\v{c}, Haofei Xu, Marko Mihajlovic, Siyuan Li, Irem Demir, Emircan G\"undo\u{g}du, Lei Ke, Sergey Prokudin, Marc Pollefeys, Siyu Tang

arXiv: 2508.21060 · 2025-08-29

## TL;DR

This paper presents the first data-driven multi-view 3D point tracker that accurately tracks points in dynamic scenes using a practical number of cameras, leveraging a novel neural network architecture and synthetic training data.

## Contribution

Introduces a new multi-view 3D point tracking method that uses a feed-forward neural network with multi-view features and transformers, trained on synthetic data, for robust real-world application.

## Key findings

- Achieves median trajectory errors of 3.1 cm and 2.0 cm on benchmarks.
- Generalizes to 1-8 camera views with varying vantage points.
- Operates effectively with video lengths of 24-150 frames.

## Abstract

We introduce the first data-driven multi-view 3D point tracker, designed to track arbitrary points in dynamic scenes using multiple camera views. Unlike existing monocular trackers, which struggle with depth ambiguities and occlusion, or prior multi-camera methods that require over 20 cameras and tedious per-sequence optimization, our feed-forward model directly predicts 3D correspondences using a practical number of cameras (e.g., four), enabling robust and accurate online tracking. Given known camera poses and either sensor-based or estimated multi-view depth, our tracker fuses multi-view features into a unified point cloud and applies k-nearest-neighbors correlation alongside a transformer-based update to reliably estimate long-range 3D correspondences, even under occlusion. We train on 5K synthetic multi-view Kubric sequences and evaluate on two real-world benchmarks: Panoptic Studio and DexYCB, achieving median trajectory errors of 3.1 cm and 2.0 cm, respectively. Our method generalizes well to diverse camera setups of 1-8 views with varying vantage points and video lengths of 24-150 frames. By releasing our tracker alongside training and evaluation datasets, we aim to set a new standard for multi-view 3D tracking research and provide a practical tool for real-world applications. Project page available at https://ethz-vlg.github.io/mvtracker.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21060/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21060/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/2508.21060/full.md

---
Source: https://tomesphere.com/paper/2508.21060