Studying the Soupability of Documents in State Space Models
Yasaman Jafari, Zixian Wang, Leon Bergen, Taylor Berg-Kirkpatrick

TL;DR
This paper explores a modular document encoding approach using state space models, enabling efficient large-scale reasoning and outperforming traditional methods in multi-hop QA and retrieval tasks.
Contribution
It introduces a novel document souping method that combines independent encodings into a unified context, improving scalability and performance in reasoning tasks.
Findings
Souped representations outperform monolithic encodings in QA benchmarks.
Modular approach reduces inference costs significantly.
Scales effectively to hundreds of documents for large-scale reasoning.
Abstract
We investigate whether hidden states from Structured State Space Models (SSMs) can be merged post hoc to support downstream reasoning. Inspired by model souping, we study document souping, a strategy where documents are encoded independently, and their representations are pooled, via simple operations like averaging, into a single context state. This approach enables modular encoding and reuse without reprocessing the full input for each query. We demonstrate that finetuned Mamba2 models with souped representations achieve competitive or superior performance across multi-hop QA, sparse retrieval, and long-document reasoning tasks compared to the standard monolithic encoding approach. For example, on the RACE and QuALITY benchmarks for long document question answering, this method substantially outperforms a traditional concatenation approach. Crucially, this modular design scales to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternational Law and Aviation
