TL;DR
This paper introduces a primitive-based, category-agnostic optimization framework for recovering 3D articulated object kinematics from a single casual video, handling occlusions and camera motion effectively.
Contribution
It presents a novel primitive-fitting approach that jointly optimizes segmentation and joint parameters, outperforming existing methods on new challenging benchmarks.
Findings
Outperforms existing methods on AiP-synth and AiP-real benchmarks.
Effectively handles occlusions and rapid camera ego-motion.
Achieves accurate 3D kinematic recovery from minimal input data.
Abstract
Retrieving the 3D kinematics of articulated objects from monocular video is a fundamental challenge in computer vision. Existing methods rely on complex video setups or cues such as long-term point tracking or wide-baseline matching, but are frequently brittle under severe occlusions, rapid camera ego-motion, or weak local features. Learning-based methods, meanwhile, struggle to generalize beyond their training categories. We propose a category-agnostic optimization framework that treats articulated object understanding as a primitive-fitting problem. Geometric primitives serve as a proxy representation that avoids the pitfalls of unstable point tracks; a novel mechanism organizes them into coherent parts constrained by revolute and prismatic joints. Our formulation jointly optimizes part segmentation and joint parameters, recovering complex kinematics from a single casually captured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
