ChatHuman: Chatting about 3D Humans with Tools

Jing Lin; Yao Feng; Weiyang Liu; Michael J. Black

arXiv:2405.04533·cs.CV·May 30, 2025·3 cites

ChatHuman: Chatting about 3D Humans with Tools

Jing Lin, Yao Feng, Weiyang Liu, Michael J. Black

PDF

Open Access

TL;DR

ChatHuman is a language-driven system that integrates multiple 3D human analysis tools within a unified framework, enabling effective discussion, interpretation, and application of complex 3D human data through an LLM-based interface.

Contribution

It introduces a novel LLM-based system that autonomously selects, applies, and interprets diverse 3D human analysis tools, overcoming domain-specific challenges and enhancing performance.

Findings

01

Outperforms existing models in tool selection accuracy

02

Supports interactive user-chat capabilities

03

Effectively manages complex 3D outputs

Abstract

Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including 3D pose, shape, contact, human-object interaction, and emotion. While widely applicable in vision and other areas, such methods require expert knowledge to select, use, and interpret the results. To address this, we introduce ChatHuman, a language-driven system that integrates the capabilities of specialized methods into a unified framework. ChatHuman functions as an assistant proficient in utilizing, analyzing, and interacting with tools specific to 3D human tasks, adeptly discussing and resolving related challenges. Built on a Large Language Model (LLM) framework, ChatHuman is trained to autonomously select, apply, and interpret a diverse set of tools in response to user inputs. Our approach overcomes significant hurdles in adapting LLMs to 3D human tasks, including the need…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Natural Language Processing Techniques · Multimodal Machine Learning Applications