Allo-AVA: A Large-Scale Multimodal Conversational AI Dataset for   Allocentric Avatar Gesture Animation

Saif Punjwani; Larry Heck

arXiv:2410.16503·cs.AI·October 23, 2024

Allo-AVA: A Large-Scale Multimodal Conversational AI Dataset for Allocentric Avatar Gesture Animation

Saif Punjwani, Larry Heck

PDF

Open Access

TL;DR

Allo-AVA is a large-scale, multimodal dataset designed to improve avatar gesture animation by providing synchronized speech, facial, and body movement data for virtual environment applications.

Contribution

The paper introduces Allo-AVA, a comprehensive dataset with 1,250 hours of annotated video content for text and audio-driven avatar gesture animation in third-person view.

Findings

01

Enables development of more natural avatar animations.

02

Provides synchronized multimodal data for training AI models.

03

Facilitates research in virtual reality and digital assistants.

Abstract

The scarcity of high-quality, multimodal training data severely hinders the creation of lifelike avatar animations for conversational AI in virtual environments. Existing datasets often lack the intricate synchronization between speech, facial expressions, and body movements that characterize natural human communication. To address this critical gap, we introduce Allo-AVA, a large-scale dataset specifically designed for text and audio-driven avatar gesture animation in an allocentric (third person point-of-view) context. Allo-AVA consists of $\sim$ 1,250 hours of diverse video content, complete with audio, transcripts, and extracted keypoints. Allo-AVA uniquely maps these keypoints to precise timestamps, enabling accurate replication of human movements (body and facial gestures) in synchronization with speech. This comprehensive resource enables the development and evaluation of more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Multimodal Machine Learning Applications