A Survey of Multimodal Sarcasm Detection
Shafkat Farabi, Tharindu Ranasinghe, Diptesh Kanojia, Yu Kong, Marcos, Zampieri

TL;DR
This paper provides the first comprehensive survey of multimodal sarcasm detection, covering models, datasets, and future directions across multiple modalities like text, audio, images, and video.
Contribution
It systematically reviews recent research on multimodal sarcasm detection, highlighting the integration of various modalities and identifying gaps for future exploration.
Findings
Multimodal models improve sarcasm detection accuracy.
Diverse datasets enable better model training.
Future research should focus on cross-modal integration.
Abstract
Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance. Sarcasm is widely used on social media and other forms of computer-mediated communication motivating the use of computational models to identify it automatically. While the clear majority of approaches to sarcasm detection have been carried out on text only, sarcasm detection often requires additional information present in tonality, facial expression, and contextual images. This has led to the introduction of multimodal models, opening the possibility to detect sarcasm in multiple modalities such as audio, images, text, and video. In this paper, we present the first comprehensive survey on multimodal sarcasm detection - henceforth MSD - to date. We survey papers published between 2018 and 2023 on the topic, and discuss the models and datasets used for this task. We also present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
