TimeSeriesExam: A time series understanding exam
Yifu Cai, Arjun Choudhry, Mononito Goswami, Artur Dubrawski

TL;DR
This paper introduces TimeSeriesExam, a comprehensive multiple-choice question test to evaluate large language models' understanding of core time series concepts, revealing strengths and weaknesses across different models.
Contribution
The paper presents a novel scalable exam with over 700 questions to systematically assess LLMs' understanding of time series data, filling a knowledge gap in model interpretability.
Findings
GPT-4 and Gemini outperform open-source models on simple concepts
All models struggle with causality analysis in time series
Question generation is key for assessing LLM understanding
Abstract
Large Language Models (LLMs) have recently demonstrated a remarkable ability to model time series data. These capabilities can be partly explained if LLMs understand basic time series concepts. However, our knowledge of what these models understand about time series data remains relatively limited. To address this gap, we introduce TimeSeriesExam, a configurable and scalable multiple-choice question exam designed to assess LLMs across five core time series understanding categories: pattern recognition, noise understanding, similarity analysis, anomaly detection, and causality analysis. TimeSeriesExam comprises of over 700 questions, procedurally generated using 104 carefully curated templates and iteratively refined to balance difficulty and their ability to discriminate good from bad models. We test 7 state-of-the-art LLMs on the TimeSeriesExam and provide the first comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Multi-Head Attention · Adam · Dropout
