Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement
Hao Ai, Wenjie Chang, Jianbo Jiao, Ales Leonardis, Ofek Eyal

TL;DR
This paper introduces AiM, a novel framework for analyzing articulated objects from videos and scans, achieving high-quality part segmentation and kinematic analysis without prior structural assumptions.
Contribution
AiM employs a dual-Gaussian scene representation and sequential RANSAC to perform part segmentation and articulation analysis without prior knowledge or multiple states.
Findings
Outperforms previous methods in part segmentation quality
Automatically determines the number of parts and their kinematics
Demonstrates strong generalization on complex objects
Abstract
Articulated objects are ubiquitous in daily life. Our goal is to achieve a high-quality reconstruction, segmentation of independent moving parts, and analysis of articulation. Recent methods analyse two different articulation states and perform per-point part segmentation, optimising per-part articulation using cross-state correspondences, given a priori knowledge of the number of parts. Such assumptions greatly limit their applications and performance. Their robustness is reduced when objects cannot be clearly visible in both states. To address these issues, in this paper, we present a new framework, Articulation in Motion (AiM). We infer part-level decomposition, articulation kinematics, and reconstruct an interactive 3D digital replica from a user-object interaction video and a start-state scan. We propose a dual-Gaussian scene representation that is learned from an initial 3DGS scan…
Peer Reviews
Decision·ICLR 2026 Poster
This is an extremely well-motivated approach where the design decisions have a clear and obvious contribution and that substantially improves over multiple recent existing approaches. The writing is clear, and the design decisions are well supported by experimental evaluation. + Comfortably state-of-the-art + Good ablation studies showing the contribution of each component. + Thoughtful qualitative results/figures making it clear how each component is useful. This paper opens the door to mor
The clarity of the approach probably works against this paper (at least at review time, this clarity is a benefit when published), and I suspect that some of the other reviews may complain about lack of novelty as it looks obvious in hindsight. Of course, if it was obvious, the existing approaches would already be doing something similar, and the performance improvement would be less noticeable. I think describing the dual Gaussian approach as prior free is somewhat misleading. The reason it w
• The internal organisation of the paper is a significant strength, ensuring that each component of the proposed solution is introduced and explained coherently and sequentially. • The proposed framework introduces a strong methodological contribution, integrating several state-of-the-art algorithms in a novel manner to solve an important problem, such as the analysis and reconstruction of articulated objects without any prior knowledge of the object's moving parts. • The quantitative and qualit
- The dual Gaussian representation is introduced as novel, but it seems the same as the dense static Gaussians and sparse dynamic Gaussians introduced in ArtGS. - Apart from the novelty of the video interaction, it is not quite clear where the novelty here is w.r.t. ArtGS.
1. The paper explores a new setting that copes with start-state scans and a human-object interaction video instead of two-state multi-view observations in previous papers. This sounds more reasonable and practical than previous papers. 2. The proposed method use RANSAC to perform part discovery, eliminating the necessity of knowing part numbers ahead. 3. The idea of dual-Gaussian representation that differentiate static and moving Gaussians sound reasonable to me.
1. One of the claim of this paper is that it copes with a human-object interaction video and a start-state scan instead of two-state multi-view observations. But the dataset used in the paper seems to be rendered from PartNet-Mobility. Some real-world examples should be helpful. 2. The baseline choices seem weird. The chosen baselines are mainly PARIS, DTA, and ArtGS, which mainly focuses on two-state observations. Some more plausible baseline may be Video2Articulation[1]. 3. The authors should
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning
