LRG at SemEval-2020 Task 7: Assessing the Ability of BERT and Derivative Models to Perform Short-Edits based Humor Grading
Siddhant Mahurkar, Rajaswa Patil

TL;DR
This study evaluates BERT and its derivatives for short-edits humor grading, demonstrating their strong generalization abilities across datasets and tasks, with insights into the role of self-attention layers.
Contribution
It provides a comprehensive assessment of BERT-based models for humor grading, including zero-shot and cross-dataset inference, and analyzes the impact of self-attention on humor classification.
Findings
All models show significant generalization in humor grading.
Pre-trained BERT derivatives perform well in zero-shot and cross-dataset settings.
Self-attention layers play a notable role in humor classification insights.
Abstract
In this paper, we assess the ability of BERT and its derivative models (RoBERTa, DistilBERT, and ALBERT) for short-edits based humor grading. We test these models for humor grading and classification tasks on the Humicroedit and the FunLines dataset. We perform extensive experiments with these models to test their language modeling and generalization abilities via zero-shot inference and cross-dataset inference based approaches. Further, we also inspect the role of self-attention layers in humor-grading by performing a qualitative analysis over the self-attention weights from the final layer of the trained BERT model. Our experiments show that all the pre-trained BERT derivative models show significant generalization capabilities for humor-grading related tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · DistilBERT · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay
