Global Reinforcement Learning: Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods
Riccardo De Santi, Manish Prajapat, Andreas Krause

TL;DR
This paper introduces Global Reinforcement Learning (GRL), a framework that models complex, non-additive rewards over entire trajectories, enabling solutions for tasks involving intricate state interactions that traditional RL cannot handle.
Contribution
The paper proposes a novel algorithmic approach for GRL that transforms it into standard RL problems with approximation guarantees based on submodular optimization techniques.
Findings
Effective in modeling non-additive, trajectory-wide rewards
Provides approximation guarantees with curvature-dependent bounds
Empirically outperforms traditional RL in complex tasks
Abstract
In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design, exploration, imitation learning, and risk-averse RL to name a few. This is due to the fact that additive objectives disregard interactions between states that are crucial for certain tasks. To tackle this problem, we introduce Global RL (GRL), where rewards are globally defined over trajectories instead of locally over states. Global rewards can capture negative interactions among states, e.g., in exploration, via submodularity, positive interactions, e.g., synergetic effects, via supermodularity, while mixed interactions via combinations of them. By exploiting ideas from submodular optimization, we propose a novel algorithmic scheme that converts any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Adaptive Dynamic Programming Control · Energy Harvesting in Wireless Networks
