Deciding What to Learn: A Rate-Distortion Approach
Dilip Arumugam, Benjamin Van Roy

TL;DR
This paper introduces a rate-distortion based framework that allows agents to autonomously balance learning costs and policy optimality, guided by a single preference parameter, improving decision-making efficiency.
Contribution
It presents a novel rate-distortion approach enabling agents to self-determine learning targets based on designer preferences, reducing the need for fixed learning objectives.
Findings
Established a bound on expected discounted regret.
Demonstrated the method's ability to express designer preferences.
Showed improvements over Thompson sampling in experiments.
Abstract
Agents that learn to select optimal actions represent a prominent focus of the sequential decision-making literature. In the face of a complex environment or constraints on time and resources, however, aiming to synthesize such an optimal policy can become infeasible. These scenarios give rise to an important trade-off between the information an agent must acquire to learn and the sub-optimality of the resulting policy. While an agent designer has a preference for how this trade-off is resolved, existing approaches further require that the designer translate these preferences into a fixed learning target for the agent. In this work, leveraging rate-distortion theory, we automate this process such that the designer need only express their preferences via a single hyperparameter and the agent is endowed with the ability to compute its own learning targets that best achieve the desired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
