A Realistic Face-to-Face Conversation System based on Deep Neural Networks
Zezhou Chen, Zhaoxiang Liu, Huan Hu, Jinqiang Bai, Shiguo, Lian, Fuyuan Shi, Kai Wang

TL;DR
This paper introduces a deep neural network-based face-to-face conversation system that generates natural facial reactions and realistic avatar images, enhancing virtual interaction experiences.
Contribution
It presents a novel system combining sequence-to-sequence models and GANs for realistic avatar facial synthesis in face-to-face conversations.
Findings
System generates natural facial reactions.
Produces realistic facial images.
Outperforms baseline in realism and naturalness.
Abstract
To improve the experiences of face-to-face conversation with avatar, this paper presents a novel conversation system. It is composed of two sequence-to-sequence models respectively for listening and speaking and a Generative Adversarial Network (GAN) based realistic avatar synthesizer. The models exploit the facial action and head pose to learn natural human reactions. Based on the models' output, the synthesizer uses the Pixel2Pixel model to generate realistic facial images. To show the improvement of our system, we use a 3D model based avatar driving scheme as a reference. We train and evaluate our neural networks with the data from ESPN shows. Experimental results show that our conversation system can generate natural facial reactions and realistic facial images.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
