M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis
Xingbo Wang, Jianben He, Zhihua Jin, Muqiao Yang, Yong Wang, Huamin Qu

TL;DR
M2Lens is an interactive visual analytics system designed to visualize and explain how multimodal sentiment analysis models utilize and interact across text, voice, and facial expressions, enhancing interpretability.
Contribution
This paper introduces M2Lens, the first system to visualize and explain intra- and inter-modal interactions in multimodal sentiment models at multiple levels.
Findings
M2Lens effectively visualizes influence of interaction types on predictions.
The system helps users understand multimodal feature importance.
Case studies show improved insights into model behavior.
Abstract
Multimodal sentiment analysis aims to recognize people's attitudes from multiple communication channels such as verbal content (i.e., text), voice, and facial expressions. It has become a vibrant and important research topic in natural language processing. Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels. However, current multimodal models with strong performance are often deep-learning-based techniques and work like black boxes. It is not clear how models utilize multimodal information for sentiment predictions. Despite recent advances in techniques for enhancing the explainability of machine learning models, they often target unimodal scenarios (e.g., images, sentences), and little research has been done on explaining multimodal models. In this paper, we present an interactive visual analytics system, M2Lens, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
