TL;DR
This paper presents a multimodal approach for emoji prediction in Instagram posts, combining visual and textual data to improve accuracy, highlighting the complementary nature of images and text in emoji usage.
Contribution
It introduces a novel multimodal model that leverages both images and text for emoji prediction, demonstrating improved performance over unimodal methods.
Findings
Multimodal model outperforms text-only and image-only models.
Combining text and images improves emoji prediction accuracy.
Text and images provide complementary information for emoji use.
Abstract
Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram posts are composed of pictures together with texts which sometimes include emojis. We show that these emojis can be predicted by using the text, but also using the picture. Our main finding is that incorporating the two synergistic modalities, in a combined model, improves accuracy in an emoji prediction task. This result demonstrates that these two modalities (text and images) encode different information on the use of emojis and therefore can complement each other.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
