The Dynamic Prior: Understanding 3D Structures for Casual Dynamic Videos
Zhuoyuan Wu, Xurui Yang, Jiahui Huang, Yue Wang, Jun Gao

TL;DR
This paper introduces the Dynamic Prior, a novel approach leveraging vision-language models and segmentation techniques to improve 3D scene understanding and motion segmentation in dynamic videos without task-specific training.
Contribution
The Dynamic Prior method enables robust dynamic object identification in videos, enhancing 3D structure estimation without relying on large-scale motion segmentation datasets.
Findings
Achieves state-of-the-art motion segmentation performance.
Significantly improves 3D structure accuracy and robustness.
Effective on both synthetic and real-world videos.
Abstract
Estimating accurate camera poses, 3D scene geometry, and object motion from in-the-wild videos is a long-standing challenge for classical structure from motion pipelines due to the presence of dynamic objects. Recent learning-based methods attempt to overcome this challenge by training motion estimators to filter dynamic objects and focus on the static background. However, their performance is largely limited by the availability of large-scale motion segmentation datasets, resulting in inaccurate segmentation and, therefore, inferior structural 3D understanding. In this work, we introduce the Dynamic Prior (\ourmodel) to robustly identify dynamic objects without task-specific training, leveraging the powerful reasoning capabilities of Vision-Language Models (VLMs) and the fine-grained spatial segmentation capacity of SAM2. \ourmodel can be seamlessly integrated into state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Advanced Vision and Imaging
