State Space Models are Strong Text Rerankers
Zhichao Xu, Jinghua Yan, Ashim Gupta, Vivek Srikumar

TL;DR
This paper evaluates state space models like Mamba as efficient alternatives to transformers for text reranking, demonstrating competitive performance but highlighting current efficiency limitations and improvements in model variants.
Contribution
It provides a comprehensive benchmark of SSM-based models against transformers for text reranking, revealing their potential and current limitations.
Findings
Mamba architectures achieve comparable ranking performance to transformers of similar size.
SSMs are less efficient in training and inference than transformers with flash attention.
Mamba-2 outperforms Mamba-1 in both performance and efficiency.
Abstract
Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state space models (SSMs) like Mamba offer promising advantages, particularly time complexity in inference. Despite their potential, SSMs' effectiveness at text reranking -- a task requiring fine-grained query-document interaction and long-context understanding -- remains underexplored. This study benchmarks SSM-based architectures (specifically, Mamba-1 and Mamba-2) against transformer-based models across various scales, architectures, and pre-training objectives, focusing on performance and efficiency in text reranking tasks. We find that (1) Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size; (2) they are less efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
