StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos

Valentin Barriere; Nahuel Gomez; Leo Hemamou; Sofia Callejas; Brian Ravenet

arXiv:2505.18903·cs.CL·May 27, 2025

StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos

Valentin Barriere, Nahuel Gomez, Leo Hemamou, Sofia Callejas, Brian Ravenet

PDF

Open Access 1 Video

TL;DR

This paper introduces a large, multilingual multimodal dataset of stand-up comedy videos for humor detection, utilizing sequence labeling and laughter detection enhancement techniques, advancing the field with diverse data and novel annotation methods.

Contribution

It presents the largest and most diverse multilingual dataset for humor detection in stand-up comedy, with a new sequence labeling approach and laughter detection improvements.

Findings

01

The dataset covers seven languages and over 330 hours of videos.

02

Sequence labeling captures humor context better than binary classification.

03

Enhanced laughter detection improves humor recognition accuracy.

Abstract

Aiming towards improving current computational models of humor detection, we propose a new multimodal dataset of stand-up comedies, in seven languages: English, French, Spanish, Italian, Portuguese, Hungarian and Czech. Our dataset of more than 330 hours, is at the time of writing the biggest available for this type of task, and the most diverse. The whole dataset is automatically annotated in laughter (from the audience), and the subpart left for model validation is manually annotated. Contrary to contemporary approaches, we do not frame the task of humor detection as a binary sequence classification, but as word-level sequence labeling, in order to take into account all the context of the sequence and to capture the continuous joke tagging mechanism typically occurring in natural conversations. As par with unimodal baselines results, we propose a method for e propose a method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

StandUp4AI: A New Multilingual Dataset for Humor Detection in Stand-up Comedy Videos· underline

Taxonomy

TopicsHumor Studies and Applications · Video Analysis and Summarization