Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media

Xuefeng Zhao; Yuxiang Wang; Zhaoman Zhong

PMC · DOI:10.3390/s25082553·April 17, 2025

Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media

Xuefeng Zhao, Yuxiang Wang, Zhaoman Zhong

PDF

Open Access

TL;DR

This paper introduces a new model for analyzing sentiment in social media by better integrating text within images and improving alignment between text and image data.

Contribution

The novel TESAM model enhances ABMSA by incorporating text-in-image and using self-supervised alignment to reduce modality gaps.

Findings

01

TESAM achieved strong performance on three ABMSA benchmarks.

02

Self-supervised alignment improved modality consistency using Euclidean and cosine measures.

03

Fusing text-in-image with visual features enhanced image representations for sentiment analysis.

Abstract

The rapid development of social media has driven the need for opinion mining and sentiment analysis based on multimodal samples. As a fine-grained task within multimodal sentiment analysis, aspect-based multimodal sentiment analysis (ABMSA) enables the accurate and efficient determination of sentiment polarity for aspect-level targets. However, traditional ABMSA methods often perform suboptimally on social media samples, as the images in these samples typically contain embedded text that conventional models overlook. Such text influences sentiment judgment. To address this issue, we propose a text-in-image enhanced self-supervised alignment model (TESAM) that accounts for multimodal information more comprehensively. Specifically, we employed Optical Character Recognition technology to extract embedded text from images and, based on the principle that text-in-image is an integral part of…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

EasyOCR

Diseases3

TESAM CLS injury to

Figures10

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications · Text and Document Classification Technologies