MIRAI: Evaluating LLM Agents for Event Forecasting
Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma,, Yanqiao Zhu, Wei Wang

TL;DR
MIRAI introduces a comprehensive benchmark to evaluate large language model agents' ability to predict international events across various time horizons, addressing the lack of systematic assessment tools in this domain.
Contribution
This paper presents MIRAI, a novel benchmark that systematically evaluates LLM agents' forecasting capabilities using a curated event database and tool integration, filling a critical gap in the field.
Findings
Demonstrates LLM agents' ability to source and integrate global event data.
Assesses agents' proficiency in code-based tool use for prediction.
Provides insights into short-term and long-term forecasting accuracy.
Abstract
Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Advanced Database Systems and Queries · Data Quality and Management
