TL;DR
This paper introduces M2H2, a new multimodal dataset for humor recognition in Hindi conversations, highlighting the importance of combining text, audio, and visual cues for improved humor detection.
Contribution
The paper presents the first large-scale multimodal Hindi humor dataset with annotations and baseline models, addressing the lack of multilingual humor recognition resources.
Findings
Multimodal information improves humor recognition accuracy.
Contextual cues are crucial for detecting humor in conversations.
Multimodal models outperform unimodal approaches on the M2H2 dataset.
Abstract
Humor recognition in conversations is a challenging task that has recently gained popularity due to its importance in dialogue understanding, including in multimodal settings (i.e., text, acoustics, and visual). The few existing datasets for humor are mostly in English. However, due to the tremendous growth in multilingual content, there is a great demand to build models and systems that support multilingual information access. To this end, we propose a dataset for Multimodal Multiparty Hindi Humor (M2H2) recognition in conversations containing 6,191 utterances from 13 episodes of a very popular TV series "Shrimaan Shrimati Phir Se". Each utterance is annotated with humor/non-humor labels and encompasses acoustic, visual, and textual modalities. We propose several strong multimodal baselines and show the importance of contextual and multimodal information for humor recognition in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
