Utility-based Adaptive Teaching Strategies using Bayesian Theory of Mind

Cl\'emence Grislain; Hugo Caselles-Dupr\'e; Olivier Sigaud; Mohamed; Chetouani

arXiv:2309.17275·cs.LG·October 2, 2023

Utility-based Adaptive Teaching Strategies using Bayesian Theory of Mind

Cl\'emence Grislain, Hugo Caselles-Dupr\'e, Olivier Sigaud, Mohamed, Chetouani

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Bayesian Theory of Mind-based teacher agents that adaptively tailor their teaching strategies to individual learners, resulting in more efficient learning in simulated environments.

Contribution

It presents a novel Bayesian ToM framework for designing adaptive teaching agents that model learners' internal states to optimize teaching strategies.

Findings

01

Learners taught by ToM-equipped teachers learn more efficiently.

02

Better alignment of the teacher's model with the learner improves teaching effectiveness.

03

Adaptive teaching strategies outperform learner-agnostic approaches.

Abstract

Good teachers always tailor their explanations to the learners. Cognitive scientists model this process under the rationality principle: teachers try to maximise the learner's utility while minimising teaching costs. To this end, human teachers seem to build mental models of the learner's internal state, a capacity known as Theory of Mind (ToM). Inspired by cognitive science, we build on Bayesian ToM mechanisms to design teacher agents that, like humans, tailor their teaching strategies to the learners. Our ToM-equipped teachers construct models of learners' internal states from observations and leverage them to select demonstrations that maximise the learners' rewards while minimising teaching costs. Our experiments in simulated environments demonstrate that learners taught this way are more efficient than those taught in a learner-agnostic way. This effect gets stronger when the…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 2

Strengths

(1) The paper proposed to use the Bayesian framework to model student's mental state, and then utilizes the learned knowledge to help improve the learning process of the student in a complicated POMDP environment. This approach is reasonable and sound. (2) The paper performed extensive empirical explorations to demonstrate that the proposed method indeed helped speed up the learning process of the student model. The results look great and are convincing.

Weaknesses

(1) It's not clear why the method requires two learning environment - a simple one for teacher to interact with the learner and gain knowledge of the student's metal state; and another more complex environment where the teacher performs real teaching. Is it possible to unify these two and let the teacher teach on the fly as it interacts with the student? (2) If there are significant distinctions between simple and complex environment. How does it affect the teaching process? Is the knowledge le

Reviewer 02Rating 1· strong rejectConfidence 5

Strengths

- The paper tackles an interesting problem: how to formalize teaching goal-directed agents. - To the extent the teaching one intends is of humans, the interest in cognitive science is laudable.

Weaknesses

The paper does not engage with, or obviously contribute to the large literature on models of teaching in machine learning (or cognitive science). Specifically, the introduction is entirely focused on one qualitative theoretical perspective in the cognitive science literature on teaching, without mentioning the extensive literature on formalizing teaching and cooperation. This literature has grown large and quite mature in recent years, including extensive mathematical and computational theories

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

+ The paper is very well-written and easy to follow. + Experimental results are encouraging as teachers taking advantage of ToM to customize the guidance out-performed the ones that do not utilize ToM. + The approach is simple and authors provided a git repository including notebooks for fast adoption.

Weaknesses

- Generalizability: The proposed ToM technique assumes access to an approximate policy of the student. Also all environments discussed in the paper were deterministic. In practice, while the set of student goals are limited, the policy they follow may be far from ideal and the presence of stochasticity may confuse the teacher further to reach a reasonable belief. Would be great to discuss these limitation in the paper. - Computational Complexity Analysis: Given the calculation of the belief over

Code & Models

Repositories

teacher-with-tom/utility_based_adaptive_teaching
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning