Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making

Dylan M. Asmar; Mykel J. Kochenderfer

arXiv:2511.12378·cs.AI·November 18, 2025

Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making

Dylan M. Asmar, Mykel J. Kochenderfer

PDF

Open Access 3 Reviews

TL;DR

This paper presents a Bayesian framework enabling autonomous agents to adaptively trust external suggestions in uncertain, sequential decision tasks by learning suggester reliability and strategically requesting advice.

Contribution

It introduces a Bayesian approach that models and adapts to varying suggester reliability and incorporates an explicit 'ask' action for strategic suggestion requests.

Findings

01

Robust performance across different suggester qualities

02

Effective adaptation to changing reliability

03

Strategic suggestion requesting improves decision quality

Abstract

Autonomous agents operating in sequential decision-making tasks under uncertainty can benefit from external action suggestions, which provide valuable guidance but inherently vary in reliability. Existing methods for incorporating such advice typically assume static and known suggester quality parameters, limiting practical deployment. We introduce a framework that dynamically learns and adapts to varying suggester reliability in partially observable environments. First, we integrate suggester quality directly into the agent's belief representation, enabling agents to infer and adjust their reliance on suggestions through Bayesian inference over suggester types. Second, we introduce an explicit ``ask'' action allowing agents to strategically request suggestions at critical moments, balancing informational gains against acquisition costs. Experimental evaluation demonstrates robust…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

The paper is well-written and easy to follow. The model is well explained and provided with nice intuitions.

Weaknesses

My main concern is the contribution of the paper: The paper should be viewed more as a "conceptual" work. As noted above, the model is newly proposed and well-explained, but I find it hard to apply it in a real-world scenario. For the following reasons: - Solving such a model requires knowing a lot of parameters like the transition matrix, the noisy rational suggester model, etc. - Generally, the POMDP framework makes the model inapplicable to a real-world scenario with a moderate state space

Reviewer 02Rating 4Confidence 3

Strengths

- Novel Motivation: (1) Addresses a real and growing need in human-AI teaming: adapting trust to variable advice quality. (2) Aligned with the trend of interactive assistance and trust calibration. - Formulation: POMDP and MOMDP are modeled in the scenarios: (1) Present a solid use of the MOMDP structure to efficiently manage the expanded state space introduced by modeling suggester reliability as a latent variable. (2) The Bayesian update mechanism for jointly inferring environment state and

Weaknesses

- Human study missing: For human-trust motivation, no human-in-the-loop experiments are conducted. Although this paper acknowledges this, it is still important for this paper. - Scalability: Tag and RockSample are standard but small. What if (1) the larger POMDP domains, (2) higher-dimensional latent human models, (3) multiple suggesters or groups of helpers. - Reliance on **known** Q-values: The ask suggestion model uses pre-solved Q values. What if (1) Q is inaccurate, (2) Q value needs to b

Reviewer 03Rating 2Confidence 3

Strengths

The method section 3 is easy to follow and the proposed contributions/components are introduced clearly with motivations. The paper also experiments in the setting where the proposed suggester model is misspecified (Section 5.4).

Weaknesses

The contributions/components (i)–(iii) listed in the summary box are somewhat orthogonal, especially (i)–(ii) relative to (iii). Without comprehensive empirical experiments demonstrating a significant performance improvement over justified baselines, the overall contribution looks like a sum of incremental components. Further, proper ablation studies are critical in this case to understand the strengths and weaknesses of the individual components (maybe Tables 1–2 may touch on this, but it is di

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Human-Automation Interaction and Safety · Decision-Making and Behavioral Economics