LHAW: Controllable Underspecification for Long-Horizon Tasks

George Pu; Michael S. Lee; Udari Madhushani Sehwag; David J. Lee; Bryan Zhu; Yash Maurya; Mohit Raghavendra; Yuan Xue; Samuel Marc Denton

arXiv:2602.10525·cs.CL·March 23, 2026

LHAW: Controllable Underspecification for Long-Horizon Tasks

George Pu, Michael S. Lee, Udari Madhushani Sehwag, David J. Lee, Bryan Zhu, Yash Maurya, Mohit Raghavendra, Yuan Xue, Samuel Marc Denton

PDF

Open Access 1 Datasets

TL;DR

LHAW introduces a modular pipeline to systematically create and evaluate task variants with controlled ambiguity, enabling better assessment and development of autonomous agents for long-horizon tasks.

Contribution

The paper presents LHAW, a novel framework for generating and measuring the impact of ambiguity in long-horizon tasks, facilitating scalable, task-agnostic evaluation of agent clarification capabilities.

Findings

01

285 task variants released for evaluation

02

Current agents show varied ability to detect and resolve underspecification

03

LHAW enables cost-sensitive assessment of clarification strategies

Abstract

Long-horizon workflow agents that operate effectively over extended periods are essential for truly autonomous systems. Their reliable execution critically depends on the ability to reason through ambiguous situations in which clarification seeking is necessary to ensure correct task execution. However, progress is limited by the lack of scalable, task-agnostic frameworks for systematically curating and measuring the impact of ambiguity across custom workflows. We address this gap by introducing LHAW (Long-Horizon Augmented Workflows), a modular, dataset-agnostic synthetic pipeline that transforms any well-specified task into controllable underspecified variants by systematically removing information across four dimensions - Goals, Constraints, Inputs, and Context - at configurable severity levels. Unlike approaches that rely on LLM predictions of ambiguity, LHAW validates variants…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ScaleAI/lhaw
dataset· 121 dl
121 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Business Process Modeling and Analysis · Explainable Artificial Intelligence (XAI)