MechVerse: Evaluating Physical Motion Consistency in Video Generation Models
Rahul Jain, Mayank Patel, Asim Unmesh, Karthik Ramani

TL;DR
MechVerse is a benchmark dataset designed to evaluate the mechanical and kinematic consistency of video generation models, highlighting their current limitations in generating physically plausible motion.
Contribution
The paper introduces MechVerse, a comprehensive benchmark with synthetic videos and structured prompts to assess and improve mechanism-aware video generation models.
Findings
Current models preserve appearance but often violate mechanical constraints.
Errors in motion increase with kinematic complexity.
MechVerse enables measurement and improvement of mechanism-aware video generation.
Abstract
Text- and image-conditioned video generation models have achieved strong visual fidelity and temporal coherence, but they often fail to generate motion governed by kinematic and geometric constraints. In these settings, object parts must remain rigid, maintain contact or coupling with neighboring components, and transfer motion consistently across connected parts. These requirements are especially explicit in articulated mechanical assemblies, where motion is constrained by rigid-link geometry, contact/coupling relations, and transmission through kinematic chains. A generated video may therefore appear plausible while violating the intended mechanism, such as rotating a part that should translate, deforming a rigid component, breaking coupling between parts, or failing to move downstream components. To evaluate this gap, We introduce MechVerse, a benchmark for mechanically consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
