Large Language Models for Predictive Analysis: How Far Are They?

Qin Chen; Yuanyi Ren; Xiaojun Ma; Yuyang Shi

arXiv:2505.17149·cs.CL·May 26, 2025

Large Language Models for Predictive Analysis: How Far Are They?

Qin Chen, Yuanyi Ren, Xiaojun Ma, Yuyang Shi

PDF

TL;DR

This paper introduces the PredictiQ benchmark to evaluate large language models' capabilities in predictive analysis across diverse real-world datasets, revealing significant current limitations.

Contribution

The study presents the first comprehensive benchmark, PredictiQ, assessing 12 LLMs on 1130 predictive queries from 44 datasets across 8 fields.

Findings

01

LLMs face challenges in predictive analysis tasks

02

Benchmark reveals gaps in current LLM capabilities

03

Provides insights for future improvements

Abstract

Predictive analysis is a cornerstone of modern decision-making, with applications in various domains. Large Language Models (LLMs) have emerged as powerful tools in enabling nuanced, knowledge-intensive conversations, thus aiding in complex decision-making tasks. With the burgeoning expectation to harness LLMs for predictive analysis, there is an urgent need to systematically assess their capability in this domain. However, there is a lack of relevant evaluations in existing studies. To bridge this gap, we introduce the \textbf{PredictiQ} benchmark, which integrates 1130 sophisticated predictive analysis queries originating from 44 real-world datasets of 8 diverse fields. We design an evaluation protocol considering text analysis, code generation, and their alignment. Twelve renowned LLMs are evaluated, offering insights into their practical use in predictive analysis. Generally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.