Mamba for Scalable and Efficient Personalized Recommendations
Andrew Starnes, Clayton Webster

TL;DR
This paper introduces FT-Mamba, a hybrid model combining Mamba layers with FT-Transformer architecture, significantly improving scalability and efficiency in personalized recommendation systems while maintaining high performance.
Contribution
The paper presents a novel hybrid model, FT-Mamba, that replaces Transformer layers with Mamba layers, reducing computational complexity and enhancing scalability in recommendation systems.
Findings
FT-Mamba outperforms traditional Transformer models in efficiency.
FT-Mamba maintains or exceeds performance metrics across datasets.
The Mamba architecture offers a scalable solution for large-scale recommendations.
Abstract
In this effort, we propose using the Mamba for handling tabular data in personalized recommendation systems. We present the \textit{FT-Mamba} (Feature Tokenizer\,\,Mamba), a novel hybrid model that replaces Transformer layers with Mamba layers within the FT-Transformer architecture, for handling tabular data in personalized recommendation systems. The \textit{Mamba model} offers an efficient alternative to Transformers, reducing computational complexity from quadratic to linear by enhancing the capabilities of State Space Models (SSMs). FT-Mamba is designed to improve the scalability and efficiency of recommendation systems while maintaining performance. We evaluate FT-Mamba in comparison to a traditional Transformer-based model within a Two-Tower architecture on three datasets: Spotify music recommendation, H\&M fashion recommendation, and vaccine messaging recommendation. Each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Video Analysis and Summarization
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding
