TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans
Aggelina Chatziagapi, Bindita Chaudhuri, Amit Kumar, Rakesh, Ranjan, Dimitris Samaras, Nikolaos Sarafianos

TL;DR
This paper presents TalkinNeRF, a unified neural radiance field framework that models full-body talking humans, capturing detailed facial expressions, hand gestures, and body pose from monocular videos for realistic animation.
Contribution
The work introduces a novel NeRF-based model that jointly represents full-body motion, including hands and face, with multi-identity and pose generalization capabilities, advancing full-body human animation.
Findings
Achieves state-of-the-art full-body talking human animation.
Effectively models detailed hand and facial movements.
Generalizes to unseen identities with minimal input.
Abstract
We introduce a novel framework that learns a dynamic neural radiance field (NeRF) for full-body talking humans from monocular videos. Prior work represents only the body pose or the face. However, humans communicate with their full body, combining body pose, hand gestures, as well as facial expressions. In this work, we propose TalkinNeRF, a unified NeRF-based network that represents the holistic 4D human motion. Given a monocular video of a subject, we learn corresponding modules for the body, face, and hands, that are combined together to generate the final result. To capture complex finger articulation, we learn an additional deformation field for the hands. Our multi-identity representation enables simultaneous training for multiple subjects, as well as robust animation under completely unseen poses. It can also generalize to novel identities, given only a short video as input. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications
