When Fine-Tuning Fails: Lessons from MS MARCO Passage Ranking
Manu Pande, Shahil Kumar, Anay Yatin Damle

TL;DR
This study reveals that fine-tuning pre-trained transformer models on MS MARCO passage ranking can degrade performance, highlighting the importance of preserving the original embedding space learned during extensive pre-training.
Contribution
The paper demonstrates that fine-tuning approaches often harm performance on saturated benchmarks and provides analysis showing how fine-tuning disrupts the pre-trained embedding space.
Findings
Fine-tuning degrades MS MARCO ranking performance compared to base models.
Embedding space becomes flatter after fine-tuning, as shown by UMAP visualizations.
Fine-tuning disrupts the optimal structure learned during pre-training.
Abstract
This paper investigates the counterintuitive phenomenon where fine-tuning pre-trained transformer models degrades performance on the MS MARCO passage ranking task. Through comprehensive experiments involving five model variants-including full parameter fine-tuning and parameter efficient LoRA adaptations-we demonstrate that all fine-tuning approaches underperform the base sentence-transformers/all- MiniLM-L6-v2 model (MRR@10: 0.3026). Our analysis reveals that fine-tuning disrupts the optimal embedding space structure learned during the base model's extensive pre-training on 1 billion sentence pairs, including 9.1 million MS MARCO samples. UMAP visualizations show progressive embedding space flattening, while training dynamics analysis and computational efficiency metrics further support our findings. These results challenge conventional wisdom about transfer learning effectiveness on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
