RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers
Zhichao Xu

TL;DR
This paper evaluates Mamba, a model based on state space models, for document ranking tasks, comparing its performance and efficiency to transformer models, and provides insights into its applicability in information retrieval.
Contribution
The study benchmarks Mamba's effectiveness in document ranking, highlighting its competitive performance and efficiency challenges relative to transformer models.
Findings
Mamba models perform comparably to transformer models with similar training setups.
Mamba has lower training throughput than efficient transformer implementations like Flash Attention.
The work provides a foundation for exploring Mamba in other IR tasks.
Abstract
Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism\, -- \,attention requires time complexity in training and time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims to design new mechanisms to replace attention. Recently, a notable model structure Mamba, which is based on state space models, has achieved transformer-equivalent performance in multiple sequence modeling tasks. In this work, we examine Mamba's efficacy through the lens of a classical IR task\, -- \,document ranking. A reranker model takes a query and a document as input, and predicts a scalar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining and Machine Learning Applications · Multimedia Learning Systems · Edcuational Technology Systems
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout
