HUMBO: Bridging Response Generation and Facial Expression Synthesis
Shang-Yu Su, Po-Wei Lin, Yun-Nung Chen

TL;DR
HUMBO is a novel multimodal dialogue system that generates responses and synthesizes facial expressions, enabling more human-like virtual assistants with customizable appearances and emotional expressions.
Contribution
It introduces a new system that combines response generation with facial expression synthesis, advancing multimodal interaction in virtual assistants.
Findings
Allows user customization of virtual assistant appearance
Generates coherent emotional responses with facial expressions
Presents a new multimodal interaction framework
Abstract
Spoken dialogue systems that assist users to solve complex tasks such as movie ticket booking have become an emerging research topic in artificial intelligence and natural language processing areas. With a well-designed dialogue system as an intelligent personal assistant, people can accomplish certain tasks more easily via natural language interactions. Today there are several virtual intelligent assistants in the market; however, most systems only focus on textual or vocal interaction. In this paper, we present HUMBO, a system aiming at generating dialogue responses and simultaneously synthesize corresponding visual expressions on faces for better multimodal interaction. HUMBO can (1) let users determine the appearances of virtual assistants by a single image, and (2) generate coherent emotional utterances and facial expressions on the user-provided image. This is not only a brand new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
