TL;DR
This paper demonstrates that pre-trained large language models can learn and predict sequences generated by Hidden Markov Models through in-context learning, achieving near-optimal accuracy on synthetic data and competitive results on real-world tasks.
Contribution
It shows that LLMs can effectively model HMM-generated data via in-context learning, providing new insights and practical guidelines for scientific data analysis.
Findings
LLMs achieve near-theoretical accuracy on synthetic HMM data
Uncovered scaling trends influenced by HMM properties
ICL performs competitively on real-world animal decision-making tasks
Abstract
Hidden Markov Models (HMMs) are foundational tools for modeling sequential data with latent Markovian structure, yet fitting them to real-world data remains computationally challenging. In this work, we show that pre-trained large language models (LLMs) can effectively model data generated by HMMs via in-context learning (ICL)their ability to infer patterns from examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve predictive accuracy approaching the theoretical optimum. We uncover novel scaling trends influenced by HMM properties, and offer theoretical conjectures for these empirical observations. We also provide practical guidelines for scientists on using ICL as a diagnostic tool for complex data. On real-world animal decision-making tasks, ICL achieves competitive performance with models designed by human experts. To our knowledge, this is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
