Dynamic Scaled Gradient Descent for Stable Fine-Tuning for Classifications
Nghia Bui, Lijing Wang

TL;DR
This paper introduces a novel gradient scaling method called dynamic scaled gradient descent (Name) that stabilizes fine-tuning of pretrained models by reducing gradient cancellation, leading to more stable training and improved accuracy.
Contribution
The paper proposes Name, a new algorithm that dynamically scales gradients of correctly classified examples to enhance training stability during fine-tuning.
Findings
Name reduces performance variance across datasets.
It consistently outperforms existing fine-tuning methods.
The approach improves training stability for large pretrained models.
Abstract
Fine-tuning pretrained models has become a standard approach to adapting pretrained knowledge to improve the accuracy on new sparse, imbalance datasets. However, issues arise when optimization falls into a collapsed state, where the model gets stuck, leading to degraded performance and unstable training. One possible reason for this is the cancellation of gradients across training examples. To address this problem, we propose a novel algorithm, dynamic scaled gradient descent (\mName), that directly modifies the gradients returned by training examples, specifically, scaling down the gradients of correctly classified examples using a dynamic scaler. This strategy offers both theoretical and empirical advantages in improving training stability. Experiments on a variety of benchmark datasets, spanning multiple tasks and large pretrained models, demonstrate that our method consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
