TextToon: Real-Time Text Toonify Head Avatar from Single Video

Luchuan Song; Lele Chen; Celong Liu; Pinxin Liu; Chenliang Xu

arXiv:2410.07160·cs.CV·October 10, 2024

TextToon: Real-Time Text Toonify Head Avatar from Single Video

Luchuan Song, Lele Chen, Celong Liu, Pinxin Liu, Chenliang Xu

PDF

Open Access

TL;DR

TextToon is a novel method that creates high-quality, stylized, toonified 3D avatars from a single video and text instructions, capable of real-time animation on GPU and mobile devices.

Contribution

It introduces a new approach using conditional embedding Tri-planes and adaptive neural networks for stylized 3D avatar generation from minimal input.

Findings

01

Achieves real-time avatar rendering at 48 FPS on GPU

02

Outperforms existing methods in avatar quality and stylization

03

Supports arbitrary identity-driven animation from minimal input

Abstract

We propose TextToon, a method to generate a drivable toonified avatar. Given a short monocular video sequence and a written instruction about the avatar style, our model can generate a high-fidelity toonified avatar that can be driven in real-time by another video with arbitrary identities. Existing related works heavily rely on multi-view modeling to recover geometry via texture embeddings, presented in a static manner, leading to control limitations. The multi-view video input also makes it difficult to deploy these models in real-world applications. To address these issues, we adopt a conditional embedding Tri-plane to learn realistic and stylized facial representations in a Gaussian deformation field. Additionally, we expand the stylization capabilities of 3D Gaussian Splatting by introducing an adaptive pixel-translation neural network and leveraging patch-aware contrastive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Speech and dialogue systems · Virtual Reality Applications and Impacts

MethodsContrastive Learning