FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards

Zhixin Han; Yanzhi Zhang; Chuyang Wei; Maohang Gao; Xiawei Yue; Kefei Chen; Yu Zhuang; Haoxiang Guan; Jiyan He; Jian Li; Yitong Duan; Yu Shi; Mengting Hu; Shuxin Zheng

arXiv:2604.26733·cs.AI·May 18, 2026

FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards

Zhixin Han, Yanzhi Zhang, Chuyang Wei, Maohang Gao, Xiawei Yue, Kefei Chen, Yu Zhuang, Haoxiang Guan, Jiyan He, Jian Li, Yitong Duan, Yu Shi, Mengting Hu, Shuxin Zheng

PDF

TL;DR

FutureWorld introduces a live reinforcement learning environment that improves predictive agents by integrating real-world outcome feedback into the training process, enhancing prediction accuracy and calibration.

Contribution

It presents verl-tool-future, a novel framework that incorporates delayed real-world outcomes into reinforcement learning for predictive agents.

Findings

01

Successive training rounds improve prediction accuracy.

02

Enhanced probabilistic scoring and calibration.

03

Effective use of delayed outcome feedback as a reinforcement signal.

Abstract

Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from the real world. It can provide a large number of prediction questions grounded in diverse real-world events, while preventing answer leakage. To leverage the advantages of future prediction, we present FutureWorld, a live agentic reinforcement learning environment that closes the training loop between prediction, outcome realization, and parameter updates. Specifically, we modify and extend verl-tool, resulting in a new framework that we call verl-tool-future. Unlike standard reinforcement learning training frameworks that rely on immediate rewards, verl-tool-future stores prediction-time rollouts, backfills rewards after…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.