sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data Only
Arslan Artykov, Tom Ravaud, Corentin Sautier, Vincent Lepetit

TL;DR
sim2art is a novel framework that accurately models articulated objects from a single monocular video using synthetic training data, leveraging a robust surface point sampling representation and Transformer architecture to outperform existing methods.
Contribution
The paper introduces sim2art, a synthetic-data-trained method that recovers 3D part segmentation and joint parameters from monocular videos without domain adaptation or real-world annotations.
Findings
Outperforms state-of-the-art methods in handling large camera motions and complex articulations.
Uses synthetic training data to generalize effectively to real-world sequences.
Introduces new diverse datasets for benchmarking articulated object modeling.
Abstract
Understanding articulated objects from monocular video is a crucial yet challenging task in robotics and digital twin creation. Existing methods often rely on complex multi-view setups, high-fidelity object scans, or fragile long-term point tracks that frequently fail in casual real-world captures. In this paper, we present sim2art, a data-driven framework that recovers the 3D part segmentation and joint parameters of articulated objects from a single monocular video captured by a freely moving camera. Our core insight is a robust representation based on per-frame surface point sampling, which we augment with short-term scene flow and DINOv3 semantic features. Unlike previous works that depend on error-prone long-term correspondences, our representation is easy to obtain and exhibits a negligible difference between simulation and reality without requiring domain adaptation. Also, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
