A Teacher-Student Markov Decision Process-based Framework for Online Correctional Learning
In\^es Louren\c{c}o, Rebecka Winqvist, Cristian R. Rojas, Bo Wahlberg

TL;DR
This paper introduces a Markov decision process-based framework for correctional learning, where a teacher optimally intervenes in a student's data collection process to reduce estimation variance under resource constraints.
Contribution
It formulates the online correctional learning problem as a Markov decision process and derives the optimal policy using dynamic programming, advancing the understanding of teacher-student interactions.
Findings
Optimal online policy effectively reduces estimation variance.
Comparison shows benefits over batch correction methods.
Numerical experiments validate the framework's effectiveness.
Abstract
A classical learning setting typically concerns an agent/student who collects data, or observations, from a system in order to estimate a certain property of interest. Correctional learning is a type of cooperative teacher-student framework where a teacher, who has partial knowledge about the system, has the ability to observe and alter (correct) the observations received by the student in order to improve the accuracy of its estimate. In this paper, we show how the variance of the estimate of the student can be reduced with the help of the teacher. We formulate the corresponding online problem - where the teacher has to decide, at each time instant, whether or not to change the observations due to a limited budget - as a Markov decision process, from which the optimal policy is derived using dynamic programming. We validate the framework in numerical experiments, and compare the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing
