TDMM-LM: Bridging Facial Understanding and Animation via Language Models

Luchuan Song; Pinxin Liu; Haiyang Liu; Zhenchao Jin; Yolo Yunlong Tang; Zichong Xu; Susan Liang; Jing Bi; Jason J Corso; Chenliang Xu

arXiv:2603.16936·cs.CV·March 19, 2026

TDMM-LM: Bridging Facial Understanding and Animation via Language Models

Luchuan Song, Pinxin Liu, Haiyang Liu, Zhenchao Jin, Yolo Yunlong Tang, Zichong Xu, Susan Liang, Jing Bi, Jason J Corso, Chenliang Xu

PDF

Open Access

TL;DR

This paper introduces a novel approach that uses language models to understand and generate facial animations by creating a large synthetic dataset and framing facial motion as a language problem, enabling bidirectional tasks.

Contribution

The work is the first to treat facial-parameter modeling as a language task, bridging facial understanding and animation through large-scale synthetic data and language models.

Findings

01

Language models can interpret facial motion with strong generalization.

02

Language models can synthesize facial motion from text prompts effectively.

03

The approach establishes a unified framework for facial animation and understanding.

Abstract

Text-guided human body animation has advanced rapidly, yet facial animation lags due to the scarcity of well-annotated, text-paired facial corpora. To close this gap, we leverage foundation generative models to synthesize a large, balanced corpus of facial behavior. We design prompts suite covering emotions and head motions, generate about 80 hours of facial videos with multiple generators, and fit per-frame 3D facial parameters, yielding large-scale (prompt and parameter) pairs for training. Building on this dataset, we probe language models for bidirectional competence over facial motion via two complementary tasks: (1) Motion2Language: given a sequence of 3D facial parameters, the model produces natural-language descriptions capturing content, style, and dynamics; and (2) Language2Motion: given a prompt, the model synthesizes the corresponding sequence of 3D facial parameters via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation