Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models
Matthew DosSantos DiSorbo, Harang Ju

TL;DR
This paper models how language models decide when to act or escalate in automation tasks, analyzing decision thresholds, calibration, and interventions across multiple domains and model types.
Contribution
It introduces a decision framework for escalation in LLMs, evaluates implicit thresholds and calibration, and proposes training methods to improve decision robustness.
Findings
Thresholds vary across models and domains, not predicted by size or architecture.
Self-estimates of correctness are often miscalibrated in models.
Chain-of-thought fine-tuning yields robust escalation policies that generalize well.
Abstract
Effective automation hinges on deciding when to act and when to escalate. We model this as a decision under uncertainty: an LLM forms a prediction, estimates its probability of being correct, and compares the expected costs of acting and escalating. Using this framework across five domains of recorded human decisions-demand forecasting, content recommendation, content moderation, loan approval, and autonomous driving-and across multiple model families, we find marked differences in the implicit thresholds models use to trade off these costs. These thresholds vary substantially and are not predicted by architecture or scale, while self-estimates are miscalibrated in model-specific ways. We then test interventions that target this decision process by varying cost ratios, providing accuracy signals, and training models to follow the desired escalation rule. Prompting helps mainly for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
