On the Possibility of Learning in Reactive Environments with Arbitrary Dependence
Daniil Ryabko, Marcus Hutter

TL;DR
This paper investigates reinforcement learning in highly general environments with arbitrary dependence, identifying conditions under which an agent can achieve optimal long-term rewards across a known class of such environments.
Contribution
It introduces sufficient conditions for learning in environments with arbitrary stochastic dependence, extending RL theory beyond traditional Markovian assumptions.
Findings
Identifies conditions enabling optimal reward attainment in complex environments.
Analyzes the relationship between these conditions and classical probabilistic assumptions.
Provides theoretical insights into learning in non-Markovian, dependent environments.
Abstract
We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions, i.e. environments more general than (PO)MDPs. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
