Genomic Next-Token Predictors are In-Context Learners
Nathan Breslow, Aayush Mishra, Mahler Revsine, Michael C. Schatz, Anqi Liu, Daniel Khashabi

TL;DR
This paper demonstrates that in-context learning, previously observed in language models, also organically emerges in genomic sequence models trained for next-nucleotide prediction, suggesting a modality-agnostic origin of this phenomenon.
Contribution
The study provides the first evidence of emergent in-context learning in genomic models, extending the concept beyond language and supporting a unified view of ICL across modalities.
Findings
Genomic models exhibit log-linear pattern induction with more demonstrations.
In-context learning occurs in genomic models trained solely on next-nucleotide prediction.
Supports the hypothesis that ICL arises from large-scale predictive modeling over rich data.
Abstract
In-context learning (ICL) -- the capacity of a model to infer and apply abstract patterns from examples provided within its input -- has been extensively studied in large language models trained for next-token prediction on human text. In fact, prior work often attributes this emergent behavior to distinctive statistical properties in human language. This raises a fundamental question: can ICL arise organically in other sequence domains purely through large-scale predictive training? To explore this, we turn to genomic sequences, an alternative symbolic domain rich in statistical structure. Specifically, we study the Evo2 genomic model, trained predominantly on next-nucleotide (A/T/C/G) prediction, at a scale comparable to mid-sized LLMs. We develop a controlled experimental framework comprising symbolic reasoning tasks instantiated in both linguistic and genomic forms, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Language Development and Disorders · Topic Modeling
