TL;DR
DexHoldem introduces a comprehensive real-world benchmark for evaluating dexterous manipulation, perception, and embodied decision-making in Texas Hold'em poker using a ShadowHand robot.
Contribution
It presents a new benchmark with demonstrations, perception and manipulation tasks, and case studies to assess embodied dexterous systems in a complex tabletop game.
Findings
Primitive execution success rate of 61.2% by the best policy.
Perception models achieved up to 66.8% accuracy in state recovery.
Case studies reveal how errors accumulate during closed-loop deployment.
Abstract
Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real-world system-level benchmark built around Texas Hold'em dexterous manipulation with a ShadowHand. DexHoldem provides 1,470 teleoperated demonstrations across 14 Texas Hold'em manipulation primitives, a standardized physical policy benchmark, and an agentic perception benchmark that tests whether agents can recover the structured game state needed for embodied decision making. On primitive execution, obtains the highest task completion rate (), while and tie on scene-preserving success rate (). On agentic perception, Opus 4.7 obtains the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
