TL;DR
This paper introduces a novel multi-modal dataset of stand-up comedy clips with humour scores derived from audience laughter, and trains models to automatically rate humour, advancing computational humour research.
Contribution
The work creates a new humour-annotated dataset with audience laughter-based scores and develops models to predict humour levels from audio and text.
Findings
Achieved 0.813 Quadratic Weighted Kappa in humour rating
Validated laughter-based scoring with a quadratic weighted kappa of 0.6
Released the 'Open Mic' dataset for future research
Abstract
Computational Humour (CH) has attracted the interest of Natural Language Processing and Computational Linguistics communities. Creating datasets for automatic measurement of humour quotient is difficult due to multiple possible interpretations of the content. In this work, we create a multi-modal humour-annotated dataset (40 hours) using stand-up comedy clips. We devise a novel scoring mechanism to annotate the training data with a humour quotient score using the audience's laughter. The normalized duration (laughter duration divided by the clip duration) of laughter in each clip is used to compute this humour coefficient score on a five-point scale (0-4). This method of scoring is validated by comparing with manually annotated scores, wherein a quadratic weighted kappa of 0.6 is obtained. We use this dataset to train a model that provides a "funniness" score, on a five-point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
