A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis
Ashima Yadav, Dinesh Kumar Vishwakarma

TL;DR
This paper introduces a Deep Multi-Level Attentive network that models complex correlations between image and text modalities to enhance multimodal sentiment analysis performance.
Contribution
It proposes a novel multi-level attention framework that captures detailed interactions between image regions and textual semantics for improved sentiment classification.
Findings
Outperforms existing methods on four real-world datasets.
Effectively models correlations between image and text modalities.
Demonstrates significant accuracy improvements in multimodal sentiment analysis.
Abstract
Multimodal sentiment analysis has attracted increasing attention with broad application prospects. The existing methods focuses on single modality, which fails to capture the social media content for multiple modalities. Moreover, in multi-modal learning, most of the works have focused on simply combining the two modalities, without exploring the complicated correlations between them. This resulted in dissatisfying performance for multimodal sentiment classification. Motivated by the status quo, we propose a Deep Multi-Level Attentive network, which exploits the correlation between image and text modalities to improve multimodal learning. Specifically, we generate the bi-attentive visual map along the spatial and channel dimensions to magnify CNNs representation power. Then we model the correlation between the image regions and semantics of the word by extracting the textual features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
