PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models' features for offensive language recognition
Piotr Janiszewski, Mateusz Skiba, Urszula Wali\'nska

TL;DR
This paper presents a method for offensive language recognition using aggregated features from fine-tuned Transformer models BERT and XLNet, achieving competitive results in SemEval-2020 tasks.
Contribution
The novel approach combines hidden layer features from BERT and XLNet for improved offensive language detection performance.
Findings
Achieved 64.727% macro F1-score in offense target identification
Ranked 7th out of 40 in Sub-task C
Achieved 89.726% F1-score in offensive language identification
Abstract
In this paper, we describe the PUM team's entry to the SemEval-2020 Task 12. Creating our solution involved leveraging two well-known pretrained models used in natural language processing: BERT and XLNet, which achieve state-of-the-art results in multiple NLP tasks. The models were fine-tuned for each subtask separately and features taken from their hidden layers were combined and fed into a fully connected neural network. The model using aggregated Transformer features can serve as a powerful tool for offensive language identification problem. Our team was ranked 7th out of 40 in Sub-task C - Offense target identification with 64.727% macro F1-score and 64th out of 85 in Sub-task A - Offensive language identification (89.726% F1-score).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Layer Normalization · Byte Pair Encoding · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay
