Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use
Changkun Ou

TL;DR
This paper formalizes trust calibration for agentic tool use as a preference-learning problem, using Gaussian processes to decide when to escalate actions to humans based on uncertainty.
Contribution
It introduces a novel approach that models human trust as a latent function and applies Bayesian optimization techniques for decision-making in automation.
Findings
The method effectively classifies actions into allow, block, or ask regions.
It leverages Gaussian-process classification for sample-efficient trust calibration.
The approach is theoretically grounded in Preferential Bayesian Optimization.
Abstract
We formalize trust calibration for agentic tool use (deciding when an automated agent's proposed action may execute autonomously versus require human approval) as a preference-learning problem. A policy gateway maintains a Gaussian-process posterior over a latent human risk-tolerance function, observed through a probit likelihood on binary approve/deny feedback, and escalates to the human exactly where the approval outcome is most uncertain. We show this is structurally an instance of Preferential Bayesian Optimization, inheriting its inference machinery (approximate Gaussian-process classification) and its sample-efficiency argument (uncertainty-targeted querying), while differing in objective: classifying an action space into allow/block/ask regions rather than optimizing a design.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
