Multimodal Systems: Taxonomy, Methods, and Challenges
Muhammad Zeeshan Baig, Manolya Kavakli

TL;DR
This paper provides an overview of multimodal systems, discussing their history, advantages, modalities, input modeling, fusion, data collection, and challenges, highlighting their potential to surpass unimodal systems in human-computer interaction.
Contribution
It offers a comprehensive taxonomy and analysis of multimodal systems, emphasizing recent advancements, common modalities, and research challenges in the field.
Findings
Multimodal systems improve task completion rates.
They reduce errors compared to unimodal systems.
Speech and gestures are the most common input modalities.
Abstract
Naturally, humans use multiple modalities to convey information. The modalities are processed both sequentially and in parallel for communication in the human brain, this changes when humans interact with computers. Empowering computers with the capability to process input multimodally is a major domain of investigation in Human-Computer Interaction (HCI). The advancement in technology (powerful mobile devices, advanced sensors, new ways of output, etc.) has opened up new gateways for researchers to design systems that allow multimodal interaction. It is a matter of time when the multimodal inputs will overtake the traditional ways of interactions. The paper provides an introduction to the domain of multimodal systems, explains a brief history, describes advantages of multimodal systems over unimodal systems, and discusses various modalities. The input modeling, fusion, and data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Hand Gesture Recognition Systems · Robotics and Automated Systems
