Don't Start What You Can't Finish: A Counterfactual Audit of Support-State Triage in LLM Agents
Eren Unlu

TL;DR
This paper introduces the Support-State Triage Audit (SSTA-32), a diagnostic framework for evaluating how well large language models can identify and handle support-related task states before acting.
Contribution
It presents a novel counterfactual diagnostic method and evaluates frontier models' ability to perform task triage across support states with different prompting strategies.
Findings
Default execution overcommits on non-complete tasks (41.7%).
Typed deferral accuracy reaches 58.3% with confidence mapping.
Action-Only and PSC prompting achieve 91.7% deferral accuracy.
Abstract
Current agent evaluations largely reward execution on fully specified tasks, while recent work studies clarification [11, 22, 2], capability awareness [9, 1], abstention [8, 14], and search termination [20, 5] mostly in isolation. This leaves open whether agents can diagnose why a task is blocked before acting. We introduce the Support-State Triage Audit (SSTA-32), a matched-item diagnostic framework in which minimal counterfactual edits flip the same base request across four support states: Complete (ANSWER), Clarifiable (CLARIFY), Support-Blocked (REQUEST SUPPORT), and Unsupported-Now (ABSTAIN). We evaluate a frontier model under four prompting conditions - Direct, Action-Only, Confidence-Only, and a typed Preflight Support Check (PSC) - using Dual-Persona Auto-Auditing (DPAA) with deterministic heuristic scoring. Default execution overcommits heavily on non-complete tasks (41.7%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
