UPB at SemEval-2022 Task 5: Enhancing UNITER with Image Sentiment and Graph Convolutional Networks for Multimedia Automatic Misogyny Identification
Andrei Paraschiv, Mihai Dascalu, Dumitru-Clementin Cercel

TL;DR
This paper presents models that improve misogyny detection in memes by enhancing UNITER with image sentiment analysis and graph convolutional networks, achieving competitive results in the SemEval-2022 task.
Contribution
The paper introduces two novel UNITER-based models incorporating image sentiment and graph convolutional networks for multimedia misogyny detection, along with an ensemble approach.
Findings
Best model achieved 71.4% F1-score in Sub-task A
Ensemble improved overall performance
Models ranked in the upper third of the leaderboard
Abstract
In recent times, the detection of hate-speech, offensive, or abusive language in online media has become an important topic in NLP research due to the exponential growth of social media and the propagation of such messages, as well as their impact. Misogyny detection, even though it plays an important part in hate-speech detection, has not received the same attention. In this paper, we describe our classification systems submitted to the SemEval-2022 Task 5: MAMI - Multimedia Automatic Misogyny Identification. The shared task aimed to identify misogynous content in a multi-modal setting by analysing meme images together with their textual captions. To this end, we propose two models based on the pre-trained UNITER model, one enhanced with an image sentiment classifier, whereas the second leverages a Vocabulary Graph Convolutional Network (VGCN). Additionally, we explore an ensemble…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts
MethodsUNiversal Image-TExt Representation Learning
