ChatHuman: Chatting about 3D Humans with Tools
Jing Lin, Yao Feng, Weiyang Liu, Michael J. Black

TL;DR
ChatHuman is a language-driven system that integrates multiple 3D human analysis tools within a unified framework, enabling effective discussion, interpretation, and application of complex 3D human data through an LLM-based interface.
Contribution
It introduces a novel LLM-based system that autonomously selects, applies, and interprets diverse 3D human analysis tools, overcoming domain-specific challenges and enhancing performance.
Findings
Outperforms existing models in tool selection accuracy
Supports interactive user-chat capabilities
Effectively manages complex 3D outputs
Abstract
Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including 3D pose, shape, contact, human-object interaction, and emotion. While widely applicable in vision and other areas, such methods require expert knowledge to select, use, and interpret the results. To address this, we introduce ChatHuman, a language-driven system that integrates the capabilities of specialized methods into a unified framework. ChatHuman functions as an assistant proficient in utilizing, analyzing, and interacting with tools specific to 3D human tasks, adeptly discussing and resolving related challenges. Built on a Large Language Model (LLM) framework, ChatHuman is trained to autonomously select, apply, and interpret a diverse set of tools in response to user inputs. Our approach overcomes significant hurdles in adapting LLMs to 3D human tasks, including the need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Natural Language Processing Techniques · Multimodal Machine Learning Applications
