AUBER: Automated BERT Regularization

Hyun Dong Lee; Seongmin Lee; U Kang

arXiv:2009.14409·cs.AI·September 15, 2021

AUBER: Automated BERT Regularization

Hyun Dong Lee, Seongmin Lee, U Kang

PDF

TL;DR

AUBER introduces a reinforcement learning-based method to automatically prune attention heads in BERT, improving regularization and performance on NLP tasks, especially with limited training data.

Contribution

It proposes a novel RL-based approach for automatic attention head pruning in BERT, surpassing heuristic methods and enhancing regularization effectiveness.

Findings

01

Achieves up to 10% better accuracy than existing pruning methods.

02

Demonstrates the effectiveness of RL-based pruning through ablation studies.

03

Improves BERT's performance on downstream NLP tasks with limited data.

Abstract

How can we effectively regularize BERT? Although BERT proves its effectiveness in various downstream natural language processing tasks, it often overfits when there are only a small number of training instances. A promising direction to regularize BERT is based on pruning its attention heads based on a proxy score for head importance. However, heuristic-based methods are usually suboptimal since they predetermine the order by which attention heads are pruned. In order to overcome such a limitation, we propose AUBER, an effective regularization method that leverages reinforcement learning to automatically prune attention heads from BERT. Instead of depending on heuristics or rule-based policies, AUBER learns a pruning policy that determines which attention heads should or should not be pruned for regularization. Experimental results show that AUBER outperforms existing pruning methods by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · Linear Layer · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Dropout · Linear Warmup With Linear Decay · Layer Normalization · Attention Dropout