Contextual Attention-Based Multimodal Fusion of LLM and CNN for Sentiment Analysis

Meriem Zerkouk; Miloud Mihoubi; Belkacem Chikhaoui

arXiv:2508.13196·cs.LG·August 20, 2025

Contextual Attention-Based Multimodal Fusion of LLM and CNN for Sentiment Analysis

Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui

PDF

TL;DR

This paper presents a novel multimodal sentiment analysis method combining CNN and LLM with a contextual attention mechanism, significantly improving accuracy in crisis-related social media data classification.

Contribution

It introduces a new fusion approach using contextual attention to effectively integrate CNN and LLM features for enhanced sentiment analysis during natural disasters.

Findings

01

Achieved 2.43% higher accuracy over baselines.

02

Improved F1-score by 5.18%.

03

Enhanced understanding of crisis-related social media sentiments.

Abstract

This paper introduces a novel approach for multimodal sentiment analysis on social media, particularly in the context of natural disasters, where understanding public sentiment is crucial for effective crisis management. Unlike conventional methods that process text and image modalities separately, our approach seamlessly integrates Convolutional Neural Network (CNN) based image analysis with Large Language Model (LLM) based text processing, leveraging Generative Pre-trained Transformer (GPT) and prompt engineering to extract sentiment relevant features from the CrisisMMD dataset. To effectively model intermodal relationships, we introduce a contextual attention mechanism within the fusion process. Leveraging contextual-attention layers, this mechanism effectively captures intermodality interactions, enhancing the model's comprehension of complex relationships between textual and visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.