Improving Question Answering Performance Using Knowledge Distillation   and Active Learning

Yasaman Boreshban; Seyed Morteza Mirbostani; Gholamreza Ghassem-Sani,; Seyed Abolghasem Mirroshandel; Shahin Amiriparian

arXiv:2109.12662·cs.CL·September 28, 2021

Improving Question Answering Performance Using Knowledge Distillation and Active Learning

Yasaman Boreshban, Seyed Morteza Mirbostani, Gholamreza Ghassem-Sani,, Seyed Abolghasem Mirroshandel, Shahin Amiriparian

PDF

Open Access 1 Repo

TL;DR

This paper introduces a combined knowledge distillation and active learning approach to significantly reduce the complexity and data requirements of question answering systems, achieving comparable performance with fewer resources.

Contribution

It presents a novel KD method for compressing BERT and integrates AL strategies to minimize annotation efforts, enabling high performance with less data and smaller models.

Findings

01

Model achieves performance of 6-layer TinyBERT and DistilBERT with only 2% of parameters.

02

State-of-the-art results on SQuAD with just 20% of training data.

03

Reduces computational and annotation costs significantly.

Abstract

Contemporary question answering (QA) systems, including transformer-based architectures, suffer from increasing computational and model complexity which render them inefficient for real-world applications with limited resources. Further, training or even fine-tuning such models requires a vast amount of labeled data which is often not available for the task at hand. In this manuscript, we conduct a comprehensive analysis of the mentioned challenges and introduce suitable countermeasures. We propose a novel knowledge distillation (KD) approach to reduce the parameter and model complexity of a pre-trained BERT system and utilize multiple active learning (AL) strategies for immense reduction in annotation efforts. In particular, we demonstrate that our model achieves the performance of a 6-layer TinyBERT and DistilBERT, whilst using only 2% of their total parameters. Finally, by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mirbostani/QA-KD-AL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Seismology and Earthquake Studies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Knowledge Distillation · Attention Dropout · Weight Decay · Linear Warmup With Linear Decay · Residual Connection · Softmax