Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning
Yaqian Zhang, Bernhard Pfahringer, Eibe Frank, Albert Bifet

TL;DR
This paper introduces a reinforcement learning-inspired approach for fine-tuning vision models, using adaptive entropy annealing to directly minimize misclassification errors and improve continual learning performance.
Contribution
It proposes a novel adaptive entropy annealing strategy (aEPG) that transitions from exploration to exploitation, directly optimizing the true classification loss in vision model fine-tuning.
Findings
aEPG outperforms traditional CE-based fine-tuning methods across multiple benchmarks
Lower entropy regularization improves adaptation in pretrained vision models
Reformulating classification as a Markov Decision Process enables direct error minimization
Abstract
Despite their success, large pretrained vision models remain vulnerable to catastrophic forgetting when adapted to new tasks in class-incremental settings. Parameter-efficient fine-tuning (PEFT) alleviates this by restricting trainable parameters, yet most approaches still rely on cross-entropy (CE) loss, a surrogate for the 0-1 loss, to learn from new data. We revisit this choice and revive the true objective (0-1 loss) through a reinforcement learning perspective. By formulating classification as a one-step Markov Decision Process, we derive an Expected Policy Gradient (EPG) method that directly minimizes misclassification error with a low-variance gradient estimation. Our analysis shows that CE can be interpreted as EPG with an additional sample-weighting mechanism: CE encourages exploration by emphasizing low-confidence samples, while EPG prioritizes high-confidence ones. Building…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
