MRecGen: Multimodal Appropriate Reaction Generator
Jiaqi Xu, Cheng Luo, Weicheng Xie, Linlin Shen, Xiaofeng Liu, Lu Liu,, Hatice Gunes, Siyang Song

TL;DR
This paper introduces MRecGen, a novel multimodal framework that generates synchronized verbal and non-verbal human reactions in response to user behavior, enhancing human-computer interaction realism.
Contribution
It presents the first multimodal reaction generation framework capable of producing synchronized text, audio, and video responses for human-like interactions.
Findings
Generates realistic, synchronized multimodal reactions
Applicable to virtual agents and robots
Demonstrates improved interaction naturalness
Abstract
Verbal and non-verbal human reaction generation is a challenging task, as different reactions could be appropriate for responding to the same behaviour. This paper proposes the first multiple and multimodal (verbal and nonverbal) appropriate human reaction generation framework that can generate appropriate and realistic human-style reactions (displayed in the form of synchronised text, audio and video streams) in response to an input user behaviour. This novel technique can be applied to various human-computer interaction scenarios by generating appropriate virtual agent/robot behaviours. Our demo is available at \url{https://github.com/SSYSteve/MRecGen}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Social Robot Interaction and HRI · AI in Service Interactions
