TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction

Ao Li; Yonggen Ling; Yiyang Lin; Yuji Wang; Yong Deng; Yansong Tang

arXiv:2604.08921·cs.CV·April 13, 2026

TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction

Ao Li, Yonggen Ling, Yiyang Lin, Yuji Wang, Yong Deng, Yansong Tang

PDF

1 Repo

TL;DR

TAIHRI is a novel vision-language model designed for precise 3D localization of task-relevant human body parts in close-range human-robot interaction, enabling more natural and safe robot responses.

Contribution

It introduces the first VLM tailored for HRI perception that localizes critical body parts in 3D space using 2D reasoning and adapts to downstream tasks.

Findings

01

Achieves superior accuracy in localizing task-critical body parts.

02

Effectively adapts to natural language commands and global space recovery.

03

Demonstrates effectiveness on egocentric interaction benchmarks.

Abstract

Accurate 3D human keypoints localization is a critical technology enabling robots to achieve natural and safe physical interaction with users. Conventional 3D human keypoints estimation methods primarily focus on the whole-body reconstruction quality relative to the root joint. However, in practical human-robot interaction (HRI) scenarios, robots are more concerned with the precise metric-scale spatial localization of task-relevant body parts under the egocentric camera 3D coordinate. We propose TAIHRI, the first Vision-Language Model (VLM) tailored for close-range HRI perception, capable of understanding users' motion commands and directing the robot's attention to the most task-relevant keypoints. By quantizing 3D keypoints into a finite interaction space, TAIHRI precisely localize the 3D spatial coordinates of critical body parts by 2D keypoint reasoning via next token prediction,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Tencent/TAIHRI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.