Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT
Giuliano Lorenzoni, Ivens Portugal, Paulo Alencar, Donald Cowan

TL;DR
This paper investigates how hyperparameter choices affect the performance variability of DistilBERT in text classification, emphasizing the importance of tuning strategies that consider complex interactions to optimize model outcomes.
Contribution
It provides a detailed analysis of hyperparameter impacts on DistilBERT's performance, highlighting the significance of non-linear interactions for effective fine-tuning strategies.
Findings
Hyperparameters significantly influence accuracy, F1-score, and loss.
Interactions between hyperparameters like epochs and batch size are crucial.
Trade-offs exist among different performance metrics based on hyperparameter settings.
Abstract
This study evaluates fine-tuning strategies for text classification using the DistilBERT model, specifically the distilbert-base-uncased-finetuned-sst-2-english variant. Through structured experiments, we examine the influence of hyperparameters such as learning rate, batch size, and epochs on accuracy, F1-score, and loss. Polynomial regression analyses capture foundational and incremental impacts of these hyperparameters, focusing on fine-tuning adjustments relative to a baseline model. Results reveal variability in metrics due to hyperparameter configurations, showing trade-offs among performance metrics. For example, a higher learning rate reduces loss in relative analysis (p=0.027) but challenges accuracy improvements. Meanwhile, batch size significantly impacts accuracy and F1-score in absolute regression (p=0.028 and p=0.005) but has limited influence on loss optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsAttention Is All You Need · Layer Normalization · Attention Dropout · Linear Layer · Softmax · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · WordPiece · Dropout
