Human-Centric Foundation Models: Perception, Generation and Agentic Modeling
Shixiang Tang, Yizhou Wang, Lu Chen, Yuan Wang, Sida Peng, Dan Xu and, Wanli Ouyang

TL;DR
This survey reviews the development of Human-centric Foundation Models (HcFMs) that unify perception, generation, and agentic capabilities for modeling digital humans and humanoid embodiments, highlighting recent advances and future challenges.
Contribution
It provides a comprehensive taxonomy and overview of HcFMs, categorizing approaches into perception, generation, unified models, and agentic models, serving as a roadmap for future research.
Findings
HcFMs unify diverse human-centric tasks into a single framework.
State-of-the-art techniques enable multi-modal understanding and high-fidelity content generation.
Emerging challenges include robustness, versatility, and interactive intelligence.
Abstract
Human understanding and generation are critical for modeling digital humans and humanoid embodiments. Recently, Human-centric Foundation Models (HcFMs) inspired by the success of generalist models, such as large language and vision models, have emerged to unify diverse human-centric tasks into a single framework, surpassing traditional task-specific approaches. In this survey, we present a comprehensive overview of HcFMs by proposing a taxonomy that categorizes current approaches into four groups: (1) Human-centric Perception Foundation Models that capture fine-grained features for multi-modal 2D and 3D understanding. (2) Human-centric AIGC Foundation Models that generate high-fidelity, diverse human-related content. (3) Unified Perception and Generation Models that integrate these capabilities to enhance both human understanding and synthesis. (4) Human-centric Agentic Foundation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Multimodal Machine Learning Applications · Action Observation and Synchronization
