Enhancing Romanian Offensive Language Detection through Knowledge Distillation, Multi-Task Learning, and Data Augmentation
Vlad-Cristian Matei, Iulian-Marius T\u{a}iatu, R\u{a}zvan-Alexandru, Sm\u{a}du, Dumitru-Clementin Cercel

TL;DR
This paper presents a comprehensive approach combining knowledge distillation, multi-task learning, and data augmentation to improve the efficiency and performance of Romanian offensive language detection models.
Contribution
It introduces a novel combination of techniques specifically tailored for Romanian offensive language detection, achieving improved model efficiency and accuracy.
Findings
Enhanced detection accuracy through data augmentation.
Improved model efficiency via knowledge distillation.
Effective multi-task learning with diverse datasets.
Abstract
This paper highlights the significance of natural language processing (NLP) within artificial intelligence, underscoring its pivotal role in comprehending and modeling human language. Recent advancements in NLP, particularly in conversational bots, have garnered substantial attention and adoption among developers. This paper explores advanced methodologies for attaining smaller and more efficient NLP models. Specifically, we employ three key approaches: (1) training a Transformer-based neural network to detect offensive language, (2) employing data augmentation and knowledge distillation techniques to increase performance, and (3) incorporating multi-task learning with knowledge distillation and teacher annealing using diverse datasets to enhance efficiency. The culmination of these methods has yielded demonstrably improved outcomes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Knowledge Distillation
