Fine-tuning BERT for Low-Resource Natural Language Understanding via   Active Learning

Daniel Grie{\ss}haber; Johannes Maucher; Ngoc Thang Vu

arXiv:2012.02462·cs.CL·December 7, 2020

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

Daniel Grie{\ss}haber, Johannes Maucher, Ngoc Thang Vu

PDF

TL;DR

This paper investigates fine-tuning BERT in low-resource scenarios using active learning to improve performance and reduce labeling costs, with experiments on the GLUE dataset showing promising results.

Contribution

It introduces a pool-based active learning approach for BERT fine-tuning in low-resource settings and analyzes the impact of freezing layers to reduce training complexity.

Findings

01

Active learning improves BERT performance with limited data.

02

Freezing layers reduces training parameters and enhances low-resource applicability.

03

Experimental results on GLUE demonstrate the effectiveness of the proposed methods.

Abstract

Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the suitability of this approach in low resource settings with less than 1,000 training data points. In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model -- by utilizing pool-based active learning to speed up training while keeping the cost of labeling new data constant. Our experimental results on the GLUE data set show an advantage in model performance by maximizing the approximate knowledge gain of the model when querying from the pool of unlabeled data. Finally, we demonstrate and analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Residual Connection · Adam · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay