PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting
Haowen Wang, Xiaoping Yuan, Zhao Jin, Zhen Zhao, Zhengping Che, Yousong Xue, Jin Tian, Yakun Huang, Jian Tang

TL;DR
PD$^{2}$GS introduces a novel framework for modeling articulated objects by learning a shared Gaussian field and representing interaction states as continuous deformations, enabling accurate part-level decoupling and smooth control without manual supervision.
Contribution
It proposes a unified approach that encodes geometry and kinematics jointly, refines part boundaries with vision priors, and supports continuous control and accurate modeling of articulated objects.
Findings
Outperforms prior methods in geometric accuracy
Achieves superior kinematic modeling and control consistency
Demonstrates effectiveness on both synthetic and real datasets
Abstract
Articulated objects are ubiquitous and important in robotics, AR/VR, and digital twins. Most self-supervised methods for articulated object modeling reconstruct discrete interaction states and relate them via cross-state geometric consistency, yielding representational fragmentation and drift that hinder smooth control of articulated configurations. We introduce PDGS, a novel framework that learns a shared canonical Gaussian field and models the arbitrary interaction state as its continuous deformation, jointly encoding geometry and kinematics. By associating each interaction state with a latent code and refining part boundaries using generic vision priors, PDGS enables accurate and reliable part-level decoupling while enforcing mutual exclusivity between parts and preserving scene-level coherence. This unified formulation supports part-aware reconstruction, fine-grained…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper introduces a unified framework that models articulated objects through continuous deformations of a shared canonical Gaussian field, effectively addressing the fragmentation and drift issues inherent in previous discrete-state reconstruction methods. 2. The method achieves part-level decoupling without manual supervision by leveraging generic vision priors and latent code associations, enabling fine-grained continuous control over articulated configurations. 3. The paper contrib
1. The reconstruction results exhibit excessive noise, particularly evident in the real-world examples shown in Figure 13, which raises concerns about the method's robustness in practical scenarios. 2. In Section 3.2 on deformable Gaussian splatting, the methodology bears strong similarity to existing 4DGS works such as [a], yet these related approaches are not cited or discussed. 3. The paper does not provide information about inference time per sample, which would be valuable for understandi
- Technical contribution: the paper proposes a conceptually elegant unification of geometry and kinematics via continuous deformation of a canonical Gaussian field. Coarse-to-fine segmentation combining motion trajectories with SAM-driven boundary refinement is both novel and effective. - RS-Art dataset is a meaningful contribution, bridging synthetic–real gaps with paired RGB-D data and 3D models. - Comprehensive experiments on an expanded PartNet-Mobility split and the new dataset de
- Pipeline is complex and involves many heuristic components, which limited the scalability of the method. - The method proposed in the paper seems to require multiple states, which puts forward more requirements for the data curation. Furthermore, ensuring that the camera coordinate systems of all states are aligned is a challenge. Outside the laboratory environment, such as in simple home scenarios, it is difficult for us to obtain states with multiple coordinate systems aligned, and the er
1. The newly proposed dataset RS-Art should be useful for further research work if made public, especially those real-world captures. 2. The paper seems to achieve SOTA performance than baselines with multi-state multi-view images in most cases. 3. The authors conducted extensive experiments on different datasets.
1. The whole systems seem to compose of numerous parts, which may be a little complicate and hard to extend. 2. Some visualizations on the newly-proposed dataset, including the data itself and the reconstructed results in videos would help readers grasp the new dataset. 3. The proposed method seem to be a little incremental though it achieves the best performance in most cases. It didn't deal with physical plausibility like 3D penetration. Its setting is also not unique as the main difference wi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Robot Manipulation and Learning · Human Pose and Action Recognition
