Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction
Bobak Farzin, Piotr Czapla, Jeremy Howard

TL;DR
This paper presents a system that uses a Spanish pre-trained language model to predict humor in Twitter posts, achieving top rankings in the HAHA 2019 Challenge by leveraging a large corpus and innovative training techniques.
Contribution
It introduces a Spanish-specific language model trained from scratch and applies label smoothing to improve humor prediction accuracy in social media data.
Findings
Achieved 3rd place in classification and 2nd in regression tasks
Outperformed a Naive Bayes baseline
Demonstrated effectiveness of label smoothing in noisy labels
Abstract
Our entry into the HAHA 2019 Challenge placed in the classification task and in the regression task. We describe our system and innovations, as well as comparing our results to a Naive Bayes baseline. A large Twitter based corpus allowed us to train a language model from scratch focused on Spanish and transfer that knowledge to our competition model. To overcome the inherent errors in some labels we reduce our class confidence with label smoothing in the loss function. All the code for our project is included in a GitHub repository for easy reference and to enable replication by others.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHumor Studies and Applications · Sentiment Analysis and Opinion Mining
MethodsLabel Smoothing
