sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data Only

Arslan Artykov; Tom Ravaud; Corentin Sautier; Vincent Lepetit

arXiv:2512.07698·cs.CV·March 24, 2026

sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data Only

Arslan Artykov, Tom Ravaud, Corentin Sautier, Vincent Lepetit

PDF

Open Access

TL;DR

sim2art is a novel framework that accurately models articulated objects from a single monocular video using synthetic training data, leveraging a robust surface point sampling representation and Transformer architecture to outperform existing methods.

Contribution

The paper introduces sim2art, a synthetic-data-trained method that recovers 3D part segmentation and joint parameters from monocular videos without domain adaptation or real-world annotations.

Findings

01

Outperforms state-of-the-art methods in handling large camera motions and complex articulations.

02

Uses synthetic training data to generalize effectively to real-world sequences.

03

Introduces new diverse datasets for benchmarking articulated object modeling.

Abstract

Understanding articulated objects from monocular video is a crucial yet challenging task in robotics and digital twin creation. Existing methods often rely on complex multi-view setups, high-fidelity object scans, or fragile long-term point tracks that frequently fail in casual real-world captures. In this paper, we present sim2art, a data-driven framework that recovers the 3D part segmentation and joint parameters of articulated objects from a single monocular video captured by a freely moving camera. Our core insight is a robust representation based on per-frame surface point sampling, which we augment with short-term scene flow and DINOv3 semantic features. Unlike previous works that depend on error-prone long-term correspondences, our representation is easy to obtain and exhibits a negligible difference between simulation and reality without requiring domain adaptation. Also, by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis