DICArt: Advancing Category-level Articulated Object Pose Estimation in Discrete State-Spaces

Li Zhang; Mingyu Mei; Ailing Wang; Xianhui Meng; Yan Zhong; Xinyuan Song; Liu Liu; Rujing Wang; Zaixing He; Cewu Lu

arXiv:2602.19565·cs.CV·February 27, 2026

DICArt: Advancing Category-level Articulated Object Pose Estimation in Discrete State-Spaces

Li Zhang, Mingyu Mei, Ailing Wang, Xianhui Meng, Yan Zhong, Xinyuan Song, Liu Liu, Rujing Wang, Zaixing He, Cewu Lu

PDF

Open Access

TL;DR

DICArt introduces a novel discrete diffusion framework for category-level articulated object pose estimation, effectively handling complex search spaces and kinematic constraints, leading to improved accuracy and robustness.

Contribution

The paper proposes DICArt, a discrete diffusion-based approach with hierarchical kinematic modeling for more accurate articulated pose estimation.

Findings

01

Outperforms existing methods on synthetic datasets

02

Demonstrates robustness on real-world data

03

Effectively incorporates kinematic constraints

Abstract

Articulated object pose estimation is a core task in embodied AI. Existing methods typically regress poses in a continuous space, but often struggle with 1) navigating a large, complex search space and 2) failing to incorporate intrinsic kinematic constraints. In this work, we introduce DICArt (DIsCrete Diffusion for Articulation Pose Estimation), a novel framework that formulates pose estimation as a conditional discrete diffusion process. Instead of operating in a continuous domain, DICArt progressively denoises a noisy pose representation through a learned reverse diffusion procedure to recover the GT pose. To improve modeling fidelity, we propose a flexible flow decider that dynamically determines whether each token should be denoised or reset, effectively balancing the real and noise distributions during diffusion. Additionally, we incorporate a hierarchical kinematic coupling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis