When Fine-Tuning Fails: Lessons from MS MARCO Passage Ranking

Manu Pande; Shahil Kumar; Anay Yatin Damle

arXiv:2506.18535·cs.CL·June 24, 2025

When Fine-Tuning Fails: Lessons from MS MARCO Passage Ranking

Manu Pande, Shahil Kumar, Anay Yatin Damle

PDF

TL;DR

This study reveals that fine-tuning pre-trained transformer models on MS MARCO passage ranking can degrade performance, highlighting the importance of preserving the original embedding space learned during extensive pre-training.

Contribution

The paper demonstrates that fine-tuning approaches often harm performance on saturated benchmarks and provides analysis showing how fine-tuning disrupts the pre-trained embedding space.

Findings

01

Fine-tuning degrades MS MARCO ranking performance compared to base models.

02

Embedding space becomes flatter after fine-tuning, as shown by UMAP visualizations.

03

Fine-tuning disrupts the optimal structure learned during pre-training.

Abstract

This paper investigates the counterintuitive phenomenon where fine-tuning pre-trained transformer models degrades performance on the MS MARCO passage ranking task. Through comprehensive experiments involving five model variants-including full parameter fine-tuning and parameter efficient LoRA adaptations-we demonstrate that all fine-tuning approaches underperform the base sentence-transformers/all- MiniLM-L6-v2 model (MRR@10: 0.3026). Our analysis reveals that fine-tuning disrupts the optimal embedding space structure learned during the base model's extensive pre-training on 1 billion sentence pairs, including 9.1 million MS MARCO samples. UMAP visualizations show progressive embedding space flattening, while training dynamics analysis and computational efficiency metrics further support our findings. These results challenge conventional wisdom about transfer learning effectiveness on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.