Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models

Tobias Grantner; Emanuel Sallinger; Martin Flechl

arXiv:2604.18199·cs.CL·April 21, 2026

Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models

Tobias Grantner, Emanuel Sallinger, Martin Flechl

PDF

2 Models

TL;DR

This paper introduces recurrent language models with a novel inference strategy that enables fast, constant-memory text embeddings, offering a practical alternative to transformer models for processing long sequences.

Contribution

The authors propose a vertically chunked inference method for recurrent models, achieving efficient, low-memory text embeddings with competitive performance.

Findings

01

Recurrent models with the new inference strategy have constant memory usage beyond a certain input length.

02

Fine-tuned Mamba2 models perform competitively on various benchmarks.

03

The inference approach is validated across multiple recurrent architectures, showing consistent runtime-memory benefits.

Abstract

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alternative, introducing a vertically chunked inference strategy that enables fast embedding generation with memory usage that becomes constant in the input length once it exceeds the vertical chunk size. By fine-tuning Mamba2 models, we demonstrate their viability as general-purpose text embedders, achieving competitive performance across a range of benchmarks while maintaining a substantially smaller memory footprint compared to transformer-based counterparts. We empirically validate the applicability of our inference strategy to Mamba2, RWKV, and xLSTM models, confirming consistent runtime-memory trade-offs across architectures and establishing recurrent models as a compelling alternative to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.