BERTer: The Efficient One
Pradyumna Saligram, Andrew Lanpouthakoun

TL;DR
This paper introduces BERTer, an enhanced fine-tuning framework for BERT that combines regularization, architecture innovations, and early exiting to significantly improve performance and efficiency across various NLP tasks.
Contribution
It presents novel fine-tuning techniques including SMART regularization, a cross-embedding Siamese architecture, and early exiting methods to advance BERT's adaptability and performance.
Findings
Achieved state-of-the-art results on multiple NLP benchmarks.
Demonstrated substantial improvements in model efficiency and effectiveness.
Showcased the benefits of combining multiple fine-tuning architectures.
Abstract
We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity. Our approach leverages SMART regularization to combat overfitting, improves hyperparameter choices, employs a cross-embedding Siamese architecture for improved sentence embeddings, and introduces innovative early exiting methods. Our fine-tuning findings currently reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures, achieving a state-of-the-art performance score of on the test set, surpassing current benchmarks and highlighting BERT's adaptability in multifaceted linguistic tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies
MethodsEarly exiting using confidence measures
