Annotation Entropy Predicts Per-Example Learning Dynamics in LoRA Fine-Tuning
Brady Steele

TL;DR
This paper shows that LoRA fine-tuning can cause un-learning on examples with high annotation disagreement, with annotation entropy predicting learning dynamics across multiple models and datasets.
Contribution
It introduces annotation entropy as a predictor of per-example learning behavior in LoRA fine-tuning, revealing un-learning patterns not seen in full fine-tuning.
Findings
High annotation entropy correlates positively with increasing loss during LoRA fine-tuning.
Decoder-only models show stronger correlation than encoder models at similar LoRA ranks.
The correlation persists across different datasets, seeds, and partial controls.
Abstract
We find that LoRA fine-tuning exhibits un-learning on contested examples: items with high annotator disagreement show increasing loss during training, a qualitatively distinct pattern largely absent under full fine-tuning and consistent across all six models tested (four encoder, two decoder-only). This discovery emerges from correlating annotation entropy, computed from ChaosNLI's 100 labels per example, with per-example area under the loss curve (AULC) on SNLI and MNLI. The correlation is positive in all 25 conditions tested (Spearman -), with decoder-only models showing stronger correlations than encoders at matched LoRA rank. The effect survives partial-correlation controls and replicates across seeds and datasets. A preliminary noise-injection experiment is consistent with these findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
