Prompting Underestimates LLM Capability for Time Series Classification
Dan Schumacher, Erfan Nourbakhsh, Rocky Slavin, Anthony Rios

TL;DR
This paper reveals that large language models possess significant internal understanding of time series data, but prompt-based evaluations underestimate their capabilities, as linear probes show much higher performance than prompts suggest.
Contribution
It demonstrates that prompt-based assessments underestimate LLMs' time series understanding and shows that internal representations are more informative than prompt outputs.
Findings
Linear probes improve F1 scores from 0.15-0.26 to 0.61-0.67
Time series information emerges in early transformer layers
Prompt-based evaluations underestimate LLMs' capabilities
Abstract
Prompt-based evaluations suggest that large language models (LLMs) perform poorly on time series classification, raising doubts about whether they encode meaningful temporal structure. We show that this conclusion reflects limitations of prompt-based generation rather than the model's representational capacity by directly comparing prompt outputs with linear probes over the same internal representations. While zero-shot prompting performs near chance, linear probes improve average F1 from 0.15-0.26 to 0.61-0.67, often matching or exceeding specialized time series models. Layer-wise analyses further show that class-discriminative time series information emerges in early transformer layers and is amplified by visual and multimodal inputs. Together, these results demonstrate a systematic mismatch between what LLMs internally represent and what prompt-based evaluation reveals, leading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Language and cultural evolution · Topic Modeling
