
TL;DR
This paper introduces AIQI, a novel model-free universal reinforcement learning agent that is proven to be asymptotically near-optimal, expanding the scope of universal agents beyond model-based approaches.
Contribution
AIQI is the first model-free universal RL agent with proven asymptotic near-optimality, using universal induction over action-value functions instead of models.
Findings
AIQI is asymptotically ε-optimal under certain conditions.
AIQI performs universal induction over distributional action-value functions.
Results expand the diversity of known universal agents.
Abstract
In general reinforcement learning, all established optimal agents, including AIXI, are model-based, explicitly maintaining and using environment models. This paper introduces Universal AI with Q-Induction (AIQI), the first model-free agent proven to be asymptotically -optimal in general RL. AIQI performs universal induction over distributional action-value functions, instead of policies or environments like previous works. Under a grain of truth condition, we prove that AIQI is strong asymptotically -optimal and asymptotically -Bayes-optimal. Our results significantly expand the diversity of known universal agents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
