Learning to Make Adherence-Aware Advice

Guanting Chen; Xiaocheng Li; Chunlin Sun; Hanzhao Wang

arXiv:2310.00817·stat.ML·March 22, 2024·2 cites

Learning to Make Adherence-Aware Advice

Guanting Chen, Xiaocheng Li, Chunlin Sun, Hanzhao Wang

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces a sequential decision-making model for AI advice systems that considers human adherence and includes a defer option, with algorithms that learn optimal advice timing and improve over generic methods.

Contribution

It proposes a novel model incorporating human adherence and defer options, along with specialized learning algorithms with better convergence and empirical performance.

Findings

01

Algorithms learn optimal advice policies effectively.

02

Specialized algorithms outperform generic reinforcement learning.

03

Model adapts advice timing based on human adherence levels.

Abstract

As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- This paper deals with the important topic of human-AI collaboration, a topic that is largely overlooked by the greater ML community in favor of fully-autonomous approaches. In particular, advice-taking is an important setting of human-AI collaboration, and the formalization of this setting as a CMDP, while straightforward, presents an important step toward making progress on this important problem. - Overall, the presentation is highly legible and the key ideas are explain clearly.

Weaknesses

While this paper studies an important topic, the paper should be improved before acceptance to a top-tier conference like ICLR: - The experimental setting is extremely simplistic. The Flappy Bird MDP effectively consists of just 2 actions (up or down). The exact layout of MDP also appears fixed. Likewise, the advice-taking policies are fixed as two hard-coded policies. Effectively, the problem then reduces to learning a policy to solve 2 static MDPs with very small action spaces. Ideally the stu

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

This work investigates a novel reinforcement learning model taking into account new realistic factors, including the human 's adherence level and the defer option. The corresponding algorithm design and the theoretical analysis are novel to reinforcement learning. The empirical studies also verify the algorithms' performance. In addition, the paper is well-structured, and the main idea of this work is easy to follow.

Weaknesses

(1) The major concern about this work is that there is a gap between the result for $\mathcal{E}_1$ in Algorithm 1 and that for $\mathcal{E}_2$ in Algorithm 3. The dependence on $S$ is typically more important than $H$ in the reinforcement learning problem. As claimed by the authors, the sample complexity bound for Algorithm 3 is sharper in the dependence on S than for Algorithm 1. But $\mathcal{E}_2$ should be a harder problem than $\mathcal{E}_1$, and thus the sample complexity bound for $\mat

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. **Problem formulation:** This paper proposes a new formulation of advising by assuming a fixed human player with a fixed probability of taking advice. This problem formulation to me is novel by considering the human adherence level. 2. **Theoretical analysis:** The paper adopts a few theoretical analysis frameworks to this advising problem and derives much-improved convergence bounds by leveraging the problem structures. 3. **Empirical studies:** It is nice to see the algorithm really wor

Weaknesses

My biggest complaint about this paper is its presentation. A few comments are listed below. 1. It is weird that there is no citation at all in the introduction section. The introduction part is also particularly short with the contribution statements even deferred after the related work section. I would strongly encourage the authors to expand the introduction with more detailed explanations and more intuitions. 2. I think the "**Theoretical reinforcement learning**" section should be expande

Videos

Learning to Make Adherence-aware Advice· slideslive

Taxonomy

TopicsEthics and Social Impacts of AI · Reinforcement Learning in Robotics · Auction Theory and Applications