TL;DR
This paper introduces Prospective Learning with Control (PLuC), a new framework that extends supervised learning to non-stationary, reset-free environments, demonstrating its theoretical optimality and practical advantages over traditional RL in foraging tasks.
Contribution
The paper develops a novel prospective learning framework, proves its asymptotic optimality, and shows its superior performance in non-stationary environments compared to standard RL methods.
Findings
ERM asymptotically achieves Bayes optimal policy under certain assumptions.
Modern RL algorithms struggle and converge slower in non-stationary, reset-free environments.
PLuC outperforms RL in a 1-D foraging benchmark.
Abstract
Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in reinforcement learning (RL). RL is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility. Here, we extend supervised learning to address learning to control in non-stationary, reset-free environments. Using this framework, called ''Prospective Learning with Control'' (PLuC), we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control: foraging, a canonical task relevant to both natural and artificial agents. We illustrate that modern RL algorithms, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
