A Face-to-Face Neural Conversation Model

Hang Chu; Daiqing Li; Sanja Fidler

arXiv:1812.01525·cs.CV·December 5, 2018·1 cites

A Face-to-Face Neural Conversation Model

Hang Chu, Daiqing Li, Sanja Fidler

PDF

Open Access

TL;DR

This paper introduces a neural conversation model that integrates facial gestures with verbal text, enabling more natural and expressive face-to-face dialogue simulation.

Contribution

It presents a novel RNN encoder-decoder architecture that jointly models facial gestures and text, trained on movie data for realistic conversational responses.

Findings

01

Generated conversations are more natural and expressive.

02

Human studies favor the face-text model over text-only models.

03

The model can be applied to face-to-face chatting avatars.

Abstract

Neural networks have recently become good at engaging in dialog. However, current approaches are based solely on verbal text, lacking the richness of a real face-to-face conversation. We propose a neural conversation model that aims to read and generate facial gestures alongside with text. This allows our model to adapt its response based on the "mood" of the conversation. In particular, we introduce an RNN encoder-decoder that exploits the movement of facial muscles, as well as the verbal conversation. The decoder consists of two layers, where the lower layer aims at generating the verbal response and coarse facial expressions, while the second layer fills in the subtle gestures, making the generated output more smooth and natural. We train our neural network by having it "watch" 250 movies. We showcase our joint face-text model in generating more natural conversations through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Face recognition and analysis