Body of Her: A Preliminary Study on End-to-End Humanoid Agent

Tenglong Ao

arXiv:2408.02879·cs.CV·August 7, 2024

Body of Her: A Preliminary Study on End-to-End Humanoid Agent

Tenglong Ao

PDF

Open Access

TL;DR

This paper introduces a real-time, multimodal humanoid agent system that models speech, full-body movements, and manipulation, aiming to bridge the gap in realistic, interactive virtual humanoid agents.

Contribution

It presents a novel end-to-end neural network integrating audio-visual inputs for realistic, duplex humanoid agent behaviors, extending from a large pre-trained language model.

Findings

01

Demonstrates capabilities like generalized object manipulation.

02

Achieves real-time duplex communication.

03

Models full-body movements and facial expressions.

Abstract

Interactive virtual humanoid agent is a crucial interface with the physical world. A relatively complete humanoid agent first needs to have face and body, then possess both verbal and non-verbal (such as eye contact, facial expression, lip motion, gesture, and manipulation) abilities, and finally, it is capable of real-time duplex communication, e.g., the ability to actively interrupt conversations. Most prior systems typically only consider a subset of these elements, leaving a gap from realistic humanoid agent. In this work, we propose a real-time, duplex, interactive end-to-end network capable of modeling realistic agent behaviors, including speech, full-body movements for talking, responding, idling, and manipulation. This system is a multimodal model integrating audio and visual inputs, extended from a pre-trained large language model (LLM). We collect approximately 200,000 hours…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Neuroethics, Human Enhancement, Biomedical Innovations · Utopian, Dystopian, and Speculative Fiction