Comparative Analysis of Lion and AdamW Optimizers for Cross-Encoder Reranking with MiniLM, GTE, and ModernBERT

Shahil Kumar; Manu Pande; Anay Yatin Damle

arXiv:2506.18297·cs.IR·June 24, 2025

Comparative Analysis of Lion and AdamW Optimizers for Cross-Encoder Reranking with MiniLM, GTE, and ModernBERT

Shahil Kumar, Manu Pande, Anay Yatin Damle

PDF

TL;DR

This study compares the Lion and AdamW optimizers in fine-tuning cross-encoder rerankers across different transformer models and datasets, highlighting Lion's superior effectiveness and efficiency.

Contribution

It introduces a comprehensive analysis of Lion optimizer's impact on cross-encoder reranking models, demonstrating its advantages over AdamW in effectiveness and GPU utilization.

Findings

01

Lion optimizer improves NDCG@10 and MAP scores.

02

Lion enhances GPU efficiency by up to 10.33%.

03

ModernBERT with Lion achieves top performance on TREC DL 2019.

Abstract

Modern information retrieval systems often employ a two-stage pipeline: an efficient initial retrieval stage followed by a computationally intensive reranking stage. Cross-encoders have shown strong effectiveness for reranking due to their deep analysis of query-document pairs. This paper studies the impact of the Lion optimizer, a recent alternative to AdamW, during fine-tuning of cross-encoder rerankers. We fine-tune three transformer models-MiniLM, GTE, and ModernBERT-on the MS MARCO passage ranking dataset using both optimizers. GTE and ModernBERT support extended context lengths (up to 8192 tokens). We evaluate effectiveness using TREC 2019 Deep Learning Track and MS MARCO dev set (MRR@10). Experiments, run on the Modal cloud platform, reveal that ModernBERT with Lion achieves the best NDCG@10 (0.7225) and MAP (0.5121) on TREC DL 2019, while MiniLM with Lion ties ModernBERT for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.