How to Fine-Tune BERT for Text Classification?
Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang

TL;DR
This paper investigates various fine-tuning methods for BERT in text classification, providing a comprehensive solution that achieves state-of-the-art results across multiple datasets.
Contribution
It offers a systematic analysis of BERT fine-tuning techniques and proposes a general approach that improves performance on text classification tasks.
Findings
Achieved new state-of-the-art results on eight datasets.
Identified effective fine-tuning strategies for BERT.
Provided practical guidelines for BERT adaptation.
Abstract
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Slanted Triangular Learning Rates · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
