Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs
Ananya Pandey, Dinesh Kumar Vishwakarma

TL;DR
This paper introduces a contrastive learning-based multimodal architecture that combines image and text data to improve emoticon prediction accuracy on social media content.
Contribution
It proposes a novel dual-branch encoder with contrastive learning for joint text-image feature mapping, outperforming existing methods in emoticon prediction.
Findings
Achieved 91% accuracy and 90% MCC-score on Twitter dataset.
Demonstrated superior robustness and generalization over previous multimodal approaches.
Deep features from contrastive learning enhance emoticon recognition across modalities.
Abstract
The emoticons are symbolic representations that generally accompany the textual content to visually enhance or summarize the true intention of a written message. Although widely utilized in the realm of social media, the core semantics of these emoticons have not been extensively explored based on multiple modalities. Incorporating textual and visual information within a single message develops an advanced way of conveying information. Hence, this research aims to analyze the relationship among sentences, visuals, and emoticons. For an orderly exposition, this paper initially provides a detailed examination of the various techniques for extracting multimodal features, emphasizing the pros and cons of each method. Through conducting a comprehensive examination of several multimodal algorithms, with specific emphasis on the fusion approaches, we have proposed a novel contrastive learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language · Sentiment Analysis and Opinion Mining · Hate Speech and Cyberbullying Detection
MethodsContrastive Learning
