Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models

Matthew DosSantos DiSorbo; Harang Ju

arXiv:2604.08588·cs.LG·April 13, 2026

Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models

Matthew DosSantos DiSorbo, Harang Ju

PDF

TL;DR

This paper models how language models decide when to act or escalate in automation tasks, analyzing decision thresholds, calibration, and interventions across multiple domains and model types.

Contribution

It introduces a decision framework for escalation in LLMs, evaluates implicit thresholds and calibration, and proposes training methods to improve decision robustness.

Findings

01

Thresholds vary across models and domains, not predicted by size or architecture.

02

Self-estimates of correctness are often miscalibrated in models.

03

Chain-of-thought fine-tuning yields robust escalation policies that generalize well.

Abstract

Effective automation hinges on deciding when to act and when to escalate. We model this as a decision under uncertainty: an LLM forms a prediction, estimates its probability of being correct, and compares the expected costs of acting and escalating. Using this framework across five domains of recorded human decisions-demand forecasting, content recommendation, content moderation, loan approval, and autonomous driving-and across multiple model families, we find marked differences in the implicit thresholds models use to trade off these costs. These thresholds vary substantially and are not predicted by architecture or scale, while self-estimates are miscalibrated in model-specific ways. We then test interventions that target this decision process by varying cost ratios, providing accuracy signals, and training models to follow the desired escalation rule. Prompting helps mainly for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.