Contextual Attention-Based Multimodal Fusion of LLM and CNN for Sentiment Analysis
Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui

TL;DR
This paper presents a novel multimodal sentiment analysis method combining CNN and LLM with a contextual attention mechanism, significantly improving accuracy in crisis-related social media data classification.
Contribution
It introduces a new fusion approach using contextual attention to effectively integrate CNN and LLM features for enhanced sentiment analysis during natural disasters.
Findings
Achieved 2.43% higher accuracy over baselines.
Improved F1-score by 5.18%.
Enhanced understanding of crisis-related social media sentiments.
Abstract
This paper introduces a novel approach for multimodal sentiment analysis on social media, particularly in the context of natural disasters, where understanding public sentiment is crucial for effective crisis management. Unlike conventional methods that process text and image modalities separately, our approach seamlessly integrates Convolutional Neural Network (CNN) based image analysis with Large Language Model (LLM) based text processing, leveraging Generative Pre-trained Transformer (GPT) and prompt engineering to extract sentiment relevant features from the CrisisMMD dataset. To effectively model intermodal relationships, we introduce a contextual attention mechanism within the fusion process. Leveraging contextual-attention layers, this mechanism effectively captures intermodality interactions, enhancing the model's comprehension of complex relationships between textual and visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
