Comparative Analysis of Lion and AdamW Optimizers for Cross-Encoder Reranking with MiniLM, GTE, and ModernBERT
Shahil Kumar, Manu Pande, Anay Yatin Damle

TL;DR
This study compares the Lion and AdamW optimizers in fine-tuning cross-encoder rerankers across different transformer models and datasets, highlighting Lion's superior effectiveness and efficiency.
Contribution
It introduces a comprehensive analysis of Lion optimizer's impact on cross-encoder reranking models, demonstrating its advantages over AdamW in effectiveness and GPU utilization.
Findings
Lion optimizer improves NDCG@10 and MAP scores.
Lion enhances GPU efficiency by up to 10.33%.
ModernBERT with Lion achieves top performance on TREC DL 2019.
Abstract
Modern information retrieval systems often employ a two-stage pipeline: an efficient initial retrieval stage followed by a computationally intensive reranking stage. Cross-encoders have shown strong effectiveness for reranking due to their deep analysis of query-document pairs. This paper studies the impact of the Lion optimizer, a recent alternative to AdamW, during fine-tuning of cross-encoder rerankers. We fine-tune three transformer models-MiniLM, GTE, and ModernBERT-on the MS MARCO passage ranking dataset using both optimizers. GTE and ModernBERT support extended context lengths (up to 8192 tokens). We evaluate effectiveness using TREC 2019 Deep Learning Track and MS MARCO dev set (MRR@10). Experiments, run on the Modal cloud platform, reveal that ModernBERT with Lion achieves the best NDCG@10 (0.7225) and MAP (0.5121) on TREC DL 2019, while MiniLM with Lion ties ModernBERT for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
