A Realistic Face-to-Face Conversation System based on Deep Neural   Networks

Zezhou Chen; Zhaoxiang Liu; Huan Hu; Jinqiang Bai; Shiguo; Lian; Fuyuan Shi; Kai Wang

arXiv:1908.07750·cs.CV·August 22, 2019·5 cites

A Realistic Face-to-Face Conversation System based on Deep Neural Networks

Zezhou Chen, Zhaoxiang Liu, Huan Hu, Jinqiang Bai, Shiguo, Lian, Fuyuan Shi, Kai Wang

PDF

Open Access

TL;DR

This paper introduces a deep neural network-based face-to-face conversation system that generates natural facial reactions and realistic avatar images, enhancing virtual interaction experiences.

Contribution

It presents a novel system combining sequence-to-sequence models and GANs for realistic avatar facial synthesis in face-to-face conversations.

Findings

01

System generates natural facial reactions.

02

Produces realistic facial images.

03

Outperforms baseline in realism and naturalness.

Abstract

To improve the experiences of face-to-face conversation with avatar, this paper presents a novel conversation system. It is composed of two sequence-to-sequence models respectively for listening and speaking and a Generative Adversarial Network (GAN) based realistic avatar synthesizer. The models exploit the facial action and head pose to learn natural human reactions. Based on the models' output, the synthesizer uses the Pixel2Pixel model to generate realistic facial images. To show the improvement of our system, we use a 3D model based avatar driving scheme as a reference. We train and evaluate our neural networks with the data from ESPN shows. Experimental results show that our conversation system can generate natural facial reactions and realistic facial images.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis