BERTer: The Efficient One

Pradyumna Saligram; Andrew Lanpouthakoun

arXiv:2407.14039·cs.CL·July 22, 2024

BERTer: The Efficient One

Pradyumna Saligram, Andrew Lanpouthakoun

PDF

Open Access

TL;DR

This paper introduces BERTer, an enhanced fine-tuning framework for BERT that combines regularization, architecture innovations, and early exiting to significantly improve performance and efficiency across various NLP tasks.

Contribution

It presents novel fine-tuning techniques including SMART regularization, a cross-embedding Siamese architecture, and early exiting methods to advance BERT's adaptability and performance.

Findings

01

Achieved state-of-the-art results on multiple NLP benchmarks.

02

Demonstrated substantial improvements in model efficiency and effectiveness.

03

Showcased the benefits of combining multiple fine-tuning architectures.

Abstract

We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity. Our approach leverages SMART regularization to combat overfitting, improves hyperparameter choices, employs a cross-embedding Siamese architecture for improved sentence embeddings, and introduces innovative early exiting methods. Our fine-tuning findings currently reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures, achieving a state-of-the-art performance score of on the test set, surpassing current benchmarks and highlighting BERT's adaptability in multifaceted linguistic tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies

MethodsEarly exiting using confidence measures