Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos
Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely,, Aleksander Holynski

TL;DR
This paper introduces Stereo4D, a system that automatically generates high-quality 4D dynamic 3D reconstructions from internet stereo videos, enabling training of models for understanding 3D motion without ground truth annotations.
Contribution
The authors develop a novel pipeline for mining large-scale, high-quality 4D reconstructions from internet stereo videos, facilitating supervised learning for 3D motion understanding.
Findings
Generated large-scale 4D data enables effective training of motion prediction models.
Models trained on reconstructed data generalize well to real-world scenes.
The system achieves high-quality dynamic 3D reconstructions from diverse internet videos.
Abstract
Learning to understand dynamic 3D scenes from imagery is crucial for applications ranging from robotics to scene reconstruction. Yet, unlike other problems where large-scale supervised training has enabled rapid progress, directly supervising methods for recovering 3D motion remains challenging due to the fundamental difficulty of obtaining ground truth annotations. We present a system for mining high-quality 4D reconstructions from internet stereoscopic, wide-angle videos. Our system fuses and filters the outputs of camera pose estimation, stereo depth estimation, and temporal tracking methods into high-quality dynamic 3D reconstructions. We use this method to generate large-scale data in the form of world-consistent, pseudo-metric 3D point clouds with long-term motion trajectories. We demonstrate the utility of this data by training a variant of DUSt3R to predict structure and 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology
