# UniFLG: Unified Facial Landmark Generator from Text or Speech

**Authors:** Kentaro Mitsui, Yukiya Hono, Kei Sawada

arXiv: 2302.14337 · 2023-05-22

## TL;DR

UniFLG is a unified system that generates facial landmarks from text or speech by leveraging shared latent representations, improving naturalness and enabling landmark generation without facial video data.

## Contribution

It introduces a unified facial landmark generator that integrates text-driven and speech-driven frameworks using shared latent representations for improved performance.

## Key findings

- Achieves higher naturalness in speech synthesis and landmark generation.
- Can generate facial landmarks from speech without facial video data.
- Works for speakers without prior facial video or speech data.

## Abstract

Talking face generation has been extensively investigated owing to its wide applicability. The two primary frameworks used for talking face generation comprise a text-driven framework, which generates synchronized speech and talking faces from text, and a speech-driven framework, which generates talking faces from speech. To integrate these frameworks, this paper proposes a unified facial landmark generator (UniFLG). The proposed system exploits end-to-end text-to-speech not only for synthesizing speech but also for extracting a series of latent representations that are common to text and speech, and feeds it to a landmark decoder to generate facial landmarks. We demonstrate that our system achieves higher naturalness in both speech synthesis and facial landmark generation compared to the state-of-the-art text-driven method. We further demonstrate that our system can generate facial landmarks from speech of speakers without facial video data or even speech data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14337/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14337/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/2302.14337/full.md

---
Source: https://tomesphere.com/paper/2302.14337