Exploring Variability in Fine-Tuned Models for Text Classification with   DistilBERT

Giuliano Lorenzoni; Ivens Portugal; Paulo Alencar; Donald Cowan

arXiv:2501.00241·cs.CL·January 3, 2025

Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT

Giuliano Lorenzoni, Ivens Portugal, Paulo Alencar, Donald Cowan

PDF

Open Access

TL;DR

This paper investigates how hyperparameter choices affect the performance variability of DistilBERT in text classification, emphasizing the importance of tuning strategies that consider complex interactions to optimize model outcomes.

Contribution

It provides a detailed analysis of hyperparameter impacts on DistilBERT's performance, highlighting the significance of non-linear interactions for effective fine-tuning strategies.

Findings

01

Hyperparameters significantly influence accuracy, F1-score, and loss.

02

Interactions between hyperparameters like epochs and batch size are crucial.

03

Trade-offs exist among different performance metrics based on hyperparameter settings.

Abstract

This study evaluates fine-tuning strategies for text classification using the DistilBERT model, specifically the distilbert-base-uncased-finetuned-sst-2-english variant. Through structured experiments, we examine the influence of hyperparameters such as learning rate, batch size, and epochs on accuracy, F1-score, and loss. Polynomial regression analyses capture foundational and incremental impacts of these hyperparameters, focusing on fine-tuning adjustments relative to a baseline model. Results reveal variability in metrics due to hyperparameter configurations, showing trade-offs among performance metrics. For example, a higher learning rate reduces loss in relative analysis (p=0.027) but challenges accuracy improvements. Meanwhile, batch size significantly impacts accuracy and F1-score in absolute regression (p=0.028 and p=0.005) but has limited influence on loss optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies

MethodsAttention Is All You Need · Layer Normalization · Attention Dropout · Linear Layer · Softmax · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · WordPiece · Dropout