HUMBO: Bridging Response Generation and Facial Expression Synthesis

Shang-Yu Su; Po-Wei Lin; Yun-Nung Chen

arXiv:1905.11240·cs.CL·September 1, 2021·1 cites

HUMBO: Bridging Response Generation and Facial Expression Synthesis

Shang-Yu Su, Po-Wei Lin, Yun-Nung Chen

PDF

Open Access

TL;DR

HUMBO is a novel multimodal dialogue system that generates responses and synthesizes facial expressions, enabling more human-like virtual assistants with customizable appearances and emotional expressions.

Contribution

It introduces a new system that combines response generation with facial expression synthesis, advancing multimodal interaction in virtual assistants.

Findings

01

Allows user customization of virtual assistant appearance

02

Generates coherent emotional responses with facial expressions

03

Presents a new multimodal interaction framework

Abstract

Spoken dialogue systems that assist users to solve complex tasks such as movie ticket booking have become an emerging research topic in artificial intelligence and natural language processing areas. With a well-designed dialogue system as an intelligent personal assistant, people can accomplish certain tasks more easily via natural language interactions. Today there are several virtual intelligent assistants in the market; however, most systems only focus on textual or vocal interaction. In this paper, we present HUMBO, a system aiming at generating dialogue responses and simultaneously synthesize corresponding visual expressions on faces for better multimodal interaction. HUMBO can (1) let users determine the appearances of virtual assistants by a single image, and (2) generate coherent emotional utterances and facial expressions on the user-provided image. This is not only a brand new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis