StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos
Valentin Barriere, Nahuel Gomez, Leo Hemamou, Sofia Callejas, Brian Ravenet

TL;DR
This paper introduces a large, multilingual multimodal dataset of stand-up comedy videos for humor detection, utilizing sequence labeling and laughter detection enhancement techniques, advancing the field with diverse data and novel annotation methods.
Contribution
It presents the largest and most diverse multilingual dataset for humor detection in stand-up comedy, with a new sequence labeling approach and laughter detection improvements.
Findings
The dataset covers seven languages and over 330 hours of videos.
Sequence labeling captures humor context better than binary classification.
Enhanced laughter detection improves humor recognition accuracy.
Abstract
Aiming towards improving current computational models of humor detection, we propose a new multimodal dataset of stand-up comedies, in seven languages: English, French, Spanish, Italian, Portuguese, Hungarian and Czech. Our dataset of more than 330 hours, is at the time of writing the biggest available for this type of task, and the most diverse. The whole dataset is automatically annotated in laughter (from the audience), and the subpart left for model validation is manually annotated. Contrary to contemporary approaches, we do not frame the task of humor detection as a binary sequence classification, but as word-level sequence labeling, in order to take into account all the context of the sequence and to capture the continuous joke tagging mechanism typically occurring in natural conversations. As par with unimodal baselines results, we propose a method for e propose a method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHumor Studies and Applications · Video Analysis and Summarization
