Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents

Gil Pasternak; Dheeraj Rajagopal; Julia White; Dhruv Atreja; Matthew Thomas; George Hurn-Maloney; Ash Lewis

arXiv:2510.19771·cs.AI·February 20, 2026

Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents

Gil Pasternak, Dheeraj Rajagopal, Julia White, Dhruv Atreja, Matthew Thomas, George Hurn-Maloney, Ash Lewis

PDF

TL;DR

This paper introduces PROBE, a benchmark for evaluating proactive problem solving in LLM agents across multiple capabilities, revealing current models' limitations and guiding future improvements.

Contribution

We propose PROBE, a comprehensive benchmark decomposing proactivity into search, identification, and resolution, to evaluate and compare LLM agents' autonomous problem-solving abilities.

Findings

01

State-of-the-art models achieve around 40% performance on PROBE.

02

Current models struggle with multi-source reasoning and long-term planning.

03

Analysis of failure modes suggests directions for future research.

Abstract

LLM-based agents are increasingly moving towards proactivity: rather than awaiting instruction, they exercise agency to anticipate user needs and solve them autonomously. However, evaluating proactivity is challenging; current benchmarks are constrained to localized context, limiting their ability to test reasoning across sources and longer time horizons. To address this gap, we present PROBE (Proactive Resolution Of BottlEnecks). PROBE decomposes proactivity as a pipeline of three core capabilities: (1) searching for unspecified issues, (2) identifying specific bottlenecks, and (3) executing appropriate resolutions. We apply PROBE to evaluate leading LLMs and popular agentic frameworks, showing that even state-of-the-art models struggle to solve this benchmark. Computing our consistent measurements across frontier LLMs and agents, we find that the best end-to-end performance of 40% is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.