TL;DR
DePT3R is a unified deep learning framework that performs dense point tracking and 3D reconstruction of dynamic scenes from multiple images in a single forward pass, without needing camera poses.
Contribution
It introduces a multi-task learning approach that extracts spatio-temporal features for joint dense point tracking and 3D reconstruction, enhancing efficiency and flexibility.
Findings
Strong performance on challenging dynamic scene benchmarks.
Significant memory efficiency improvements over existing methods.
Operates without requiring camera pose information.
Abstract
Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, require known camera poses, or assume temporal ordering of input frames, thereby constraining their flexibility and applicability. Additionally, recent advances have successfully enabled efficient 3D reconstruction from large-scale, unposed image collections, underscoring opportunities for unified approaches to dynamic scene understanding. Motivated by this, we propose DePT3R, a novel framework that simultaneously performs dense point tracking and 3D reconstruction of dynamic scenes from multiple images in a single forward pass. This multi-task learning is achieved by extracting deep spatio-temporal features with a powerful backbone and regressing pixel-wise maps with dense prediction heads. Crucially, DePT3R operates without requiring camera poses, substantially enhancing its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
