Global Reinforcement Learning: Beyond Linear and Convex Rewards via   Submodular Semi-gradient Methods

Riccardo De Santi; Manish Prajapat; Andreas Krause

arXiv:2407.09905·cs.LG·July 16, 2024

Global Reinforcement Learning: Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods

Riccardo De Santi, Manish Prajapat, Andreas Krause

PDF

Open Access

TL;DR

This paper introduces Global Reinforcement Learning (GRL), a framework that models complex, non-additive rewards over entire trajectories, enabling solutions for tasks involving intricate state interactions that traditional RL cannot handle.

Contribution

The paper proposes a novel algorithmic approach for GRL that transforms it into standard RL problems with approximation guarantees based on submodular optimization techniques.

Findings

01

Effective in modeling non-additive, trajectory-wide rewards

02

Provides approximation guarantees with curvature-dependent bounds

03

Empirically outperforms traditional RL in complex tasks

Abstract

In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design, exploration, imitation learning, and risk-averse RL to name a few. This is due to the fact that additive objectives disregard interactions between states that are crucial for certain tasks. To tackle this problem, we introduce Global RL (GRL), where rewards are globally defined over trajectories instead of locally over states. Global rewards can capture negative interactions among states, e.g., in exploration, via submodularity, positive interactions, e.g., synergetic effects, via supermodularity, while mixed interactions via combinations of them. By exploiting ideas from submodular optimization, we propose a novel algorithmic scheme that converts any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Control Multi-Agent Systems · Adaptive Dynamic Programming Control · Energy Harvesting in Wireless Networks