Using Machine Teaching to Investigate Human Assumptions when Teaching   Reinforcement Learners

Yun-Shiuan Chuang; Xuezhou Zhang; Yuzhe Ma; Mark K. Ho; Joseph L.; Austerweil; Xiaojin Zhu

arXiv:2009.02476·cs.LG·June 30, 2023·6 cites

Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners

Yun-Shiuan Chuang, Xuezhou Zhang, Yuzhe Ma, Mark K. Ho, Joseph L., Austerweil, Xiaojin Zhu

PDF

Open Access

TL;DR

This paper explores human assumptions about teaching reinforcement learners, specifically Q-learning agents, through behavioral experiments and machine teaching optimization, revealing insights into effective teaching strategies and human biases.

Contribution

It introduces a normative machine teaching framework for Q-learning and investigates human assumptions, highlighting suboptimal teaching behaviors and the impact of real-time feedback.

Findings

01

People teach Q-learners efficiently with low discount and high learning rates.

02

Humans are only partially optimal in their teaching strategies.

03

Real-time updates of learner states slightly improve teaching effectiveness.

Abstract

Successful teaching requires an assumption of how the learner learns - how the learner uses experiences from the world to update their internal states. We investigate what expectations people have about a learner when they teach them in an online manner using rewards and punishment. We focus on a common reinforcement learning method, Q-learning, and examine what assumptions people have using a behavioral experiment. To do so, we first establish a normative standard, by formulating the problem as a machine teaching optimization problem. To solve the machine teaching optimization problem, we use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states. What do people assume about a learner's learning and discount rates when they teach them an idealized exploration-exploitation task? In a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications