Large Language Models for Predictive Analysis: How Far Are They?
Qin Chen, Yuanyi Ren, Xiaojun Ma, Yuyang Shi

TL;DR
This paper introduces the PredictiQ benchmark to evaluate large language models' capabilities in predictive analysis across diverse real-world datasets, revealing significant current limitations.
Contribution
The study presents the first comprehensive benchmark, PredictiQ, assessing 12 LLMs on 1130 predictive queries from 44 datasets across 8 fields.
Findings
LLMs face challenges in predictive analysis tasks
Benchmark reveals gaps in current LLM capabilities
Provides insights for future improvements
Abstract
Predictive analysis is a cornerstone of modern decision-making, with applications in various domains. Large Language Models (LLMs) have emerged as powerful tools in enabling nuanced, knowledge-intensive conversations, thus aiding in complex decision-making tasks. With the burgeoning expectation to harness LLMs for predictive analysis, there is an urgent need to systematically assess their capability in this domain. However, there is a lack of relevant evaluations in existing studies. To bridge this gap, we introduce the \textbf{PredictiQ} benchmark, which integrates 1130 sophisticated predictive analysis queries originating from 44 real-world datasets of 8 diverse fields. We design an evaluation protocol considering text analysis, code generation, and their alignment. Twelve renowned LLMs are evaluated, offering insights into their practical use in predictive analysis. Generally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
