Knowledge Distillation with Noisy Labels for Natural Language   Understanding

Shivendra Bhardwaj; Abbas Ghaddar; Ahmad Rashid; Khalil Bibi,; Chengyang Li; Ali Ghodsi; Philippe Langlais; Mehdi Rezagholizadeh

arXiv:2109.10147·cs.CL·September 22, 2021

Knowledge Distillation with Noisy Labels for Natural Language Understanding

Shivendra Bhardwaj, Abbas Ghaddar, Ahmad Rashid, Khalil Bibi,, Chengyang Li, Ali Ghodsi, Philippe Langlais, Mehdi Rezagholizadeh

PDF

Open Access

TL;DR

This paper investigates the effects of noisy labels on Knowledge Distillation in Natural Language Understanding and proposes two methods to mitigate label noise, demonstrating effectiveness on the GLUE benchmark.

Contribution

It is the first study to analyze and address noisy labels in KD for NLU, introducing two mitigation techniques and evaluating them on standard benchmarks.

Findings

01

Methods are effective under high noise levels

02

Label noise significantly impacts KD performance

03

More research needed for robust solutions

Abstract

Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications. However, one neglected area of research is the impact of noisy (corrupted) labels on KD. We present, to the best of our knowledge, the first study on KD with noisy labels in Natural Language Understanding (NLU). We document the scope of the problem and present two methods to mitigate the impact of label noise. Experiments on the GLUE benchmark show that our methods are effective even under high noise levels. Nevertheless, our results indicate that more research is necessary to cope with label noise under the KD.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Advanced Neural Network Applications