AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation
Xiyuan Gao, Shubhi Bansal, Kushaan Gowda, Zhu Li, Shekhar Nayak,, Nagendra Kumar, Matt Coler

TL;DR
This paper introduces AMuSeD, a multimodal sarcasm detection model that employs data augmentation and attention mechanisms to improve accuracy using text and audio data.
Contribution
It proposes a novel bimodal data augmentation strategy and evaluates various attention mechanisms, with self-attention proving most effective for multimodal sarcasm detection.
Findings
Achieved an F1-score of 81.0% with text-audio data.
Bimodal augmentation improves sarcasm detection performance.
Self-attention outperforms other attention mechanisms in data fusion.
Abstract
Detecting sarcasm effectively requires a nuanced understanding of context, including vocal tones and facial expressions. The progression towards multimodal computational methods in sarcasm detection, however, faces challenges due to the scarcity of data. To address this, we present AMuSeD (Attentive deep neural network for MUltimodal Sarcasm dEtection incorporating bi-modal Data augmentation). This approach utilizes the Multimodal Sarcasm Detection Dataset (MUStARD) and introduces a two-phase bimodal data augmentation strategy. The first phase involves generating varied text samples through Back Translation from several secondary languages. The second phase involves the refinement of a FastSpeech 2-based speech synthesis system, tailored specifically for sarcasm to retain sarcastic intonations. Alongside a cloud-based Text-to-Speech (TTS) service, this Fine-tuned FastSpeech 2 system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james · Attention Is All You Need · Dense Connections · Multi-Head Attention · Linear Layer · *Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer
