TL;DR
MotionScape introduces a comprehensive UAV video dataset with highly dynamic 6-DoF motion, enabling improved world models for autonomous UAV navigation in complex environments.
Contribution
We present a large-scale, real-world UAV dataset with aligned semantic, geometric, and trajectory data, developed through an automated multi-stage processing pipeline.
Findings
Incorporating the dataset improves world models' ability to simulate 3D dynamics.
The dataset enhances decision-making and planning for UAVs in complex scenarios.
Semantic and geometric annotations benefit model training and robustness.
Abstract
Recent advances in world models have demonstrated strong capabilities in simulating physical reality, making them an increasingly important foundation for embodied intelligence. For UAV agents in particular, accurate prediction of complex 3D dynamics is essential for autonomous navigation and robust decision-making in unconstrained environments. However, under the highly dynamic camera trajectories typical of UAV views, existing world models often struggle to maintain spatiotemporal physical consistency. A key reason lies in the distribution bias of current training data: most existing datasets exhibit restricted 2.5D motion patterns, such as ground-constrained autonomous driving scenes or relatively smooth human-centric egocentric videos, and therefore lack realistic high-dynamic 6-DoF UAV motion priors. To address this gap, we present MotionScape, a large-scale real-world UAV-view…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
