TSAQA: Time Series Analysis Question And Answering Benchmark
Baoyu Jing, Sanhorn Chen, Lecheng Zheng, Boyu Liu, Zihao Li, Jiaru Zou, Tianxin Wei, Zhining Liu, Zhichen Zeng, Ruizhong Qiu, Xiao Lin, Yuchen Yan, Dongqi Fu, Jingchao Ni, Jingrui He, Hanghang Tong

TL;DR
TSAQA is a comprehensive benchmark that evaluates diverse time series analysis tasks across multiple domains, revealing current LLMs' limited capabilities in understanding complex temporal data.
Contribution
The paper introduces TSAQA, a unified, multi-task benchmark covering a wide range of time series analysis tasks with a large, diverse dataset for evaluating model performance.
Findings
Current LLMs perform poorly on TSAQA tasks, with the best model scoring only 65.08.
Instruction tuning improves open-source model performance but still leaves significant room for improvement.
TSAQA highlights the complexity of temporal analysis for large language models.
Abstract
Time series data are integral to critical applications across domains such as finance, healthcare, transportation, and environmental science. While recent work has begun to explore multi-task time series question answering (QA), current benchmarks remain limited to forecasting and anomaly detection tasks. We introduce TSAQA, a novel unified benchmark designed to broaden task coverage and evaluate diverse temporal analysis capabilities. TSAQA integrates six diverse tasks under a single framework ranging from conventional analysis, including anomaly detection and classification, to advanced analysis, such as characterization, comparison, data transformation, and temporal relationship analysis. Spanning 210k samples across 13 domains, the dataset employs diverse formats, including true-or-false (TF), multiple-choice (MC), and a novel puzzling (PZ), to comprehensively assess time series…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Topic Modeling · Machine Learning in Healthcare
