Bayesian Optimality of In-Context Learning with Selective State Spaces

Di Zhang; Jiaqi Xing

arXiv:2602.17744·cs.LG·February 23, 2026

Bayesian Optimality of In-Context Learning with Selective State Spaces

Di Zhang, Jiaqi Xing

PDF

Open Access

TL;DR

This paper introduces Bayesian optimal sequential prediction as a new framework for understanding in-context learning, demonstrating that selective state space models asymptotically achieve Bayes-optimality and outperform gradient-based methods in certain tasks.

Contribution

It formalizes in-context learning as meta-learning over latent tasks and proves selective state space models attain Bayes-optimal predictions, providing a new theoretical foundation for model efficiency.

Findings

01

Selective SSMs converge faster to Bayes-optimal risk.

02

Selective SSMs show superior sample efficiency in structured-noise tasks.

03

Transformers more robustly track latent states than linear models.

Abstract

We propose Bayesian optimal sequential prediction as a new principle for understanding in-context learning (ICL). Unlike interpretations framing Transformers as performing implicit gradient descent, we formalize ICL as meta-learning over latent sequence tasks. For tasks governed by Linear Gaussian State Space Models (LG-SSMs), we prove a meta-trained selective SSM asymptotically implements the Bayes-optimal predictor, converging to the posterior predictive mean. We further establish a statistical separation from gradient descent, constructing tasks with temporally correlated noise where the optimal Bayesian predictor strictly outperforms any empirical risk minimization (ERM) estimator. Since Transformers can be seen as performing implicit ERM, this demonstrates selective SSMs achieve lower asymptotic risk due to superior statistical efficiency. Experiments on synthetic LG-SSM tasks and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis