Submodular Reinforcement Learning
Manish Prajapat, Mojm\'ir Mutn\'y, Melanie N. Zeilinger, Andreas, Krause

TL;DR
This paper introduces Submodular Reinforcement Learning (SubRL), a framework for optimizing non-additive, diminishing returns rewards in RL using greedy algorithms, with theoretical guarantees and practical applications.
Contribution
The paper proposes SubRL, a new paradigm for RL with submodular rewards, and introduces SubPO, a policy gradient algorithm with approximation guarantees for this setting.
Findings
SubPO achieves constant factor approximations in submodular bandits.
SubRL is effective in applications like biodiversity monitoring and experiment design.
The approach scales to high-dimensional state-action spaces.
Abstract
In reinforcement learning (RL), rewards of states are typically considered additive, and following the Markov assumption, they are of states visited previously. In many important applications, such as coverage control, experiment design and informative path planning, rewards naturally have diminishing returns, i.e., their value decreases in light of similar states visited previously. To tackle this, we propose (SubRL), a paradigm which seeks to optimize more general, non-additive (and history-dependent) rewards modelled via submodular set functions which capture diminishing returns. Unfortunately, in general, even in tabular settings, we show that the resulting optimization problem is hard to approximate. On the other hand, motivated by the success of greedy algorithms in classical submodular optimization, we propose SubPO, a simple policy…
Peer Reviews
Decision·ICLR 2024 spotlight
- The considered problem is interesting and significant. - Extensive and rigorous experiment results have been presented in Section 7. - The paper is well-written in general, and easy to read.
- The idea behind the proposed algorithm, Submodulr Policy Optimization, is quite straightforward. It is just a relatively straightforward extension of the classical policy optimization algorithm. - The analysis in Section 5 seems to be very restricted. Could the authors provide a similar analysis in more general settings?
Combining submodularity with reinforcement learning in a generalized way seems highly intuitive that I am surprised it has not been proposed before. This emphasizes the significance of the paper's contribution. The main idea of the paper is a simple yet powerful one. Additionally, the paper is well written and the ideas or conveyed clearly.
These are more minor suggestions for improvement rather than weaknesses: - On the last paragraph of page 1, the adverbs firstly, secondly, thirdly can be just replaced with first, second, and third. Also, we after the firstly should be lowercase. - I think there can be a broader discussion of using submodular functions in reinforcement learning setups in the related work section. I am aware that the introduction also mentions some examples of submodular rewards, but I believe it is interesting e
The submission introduces a novel and "mathematically" interesting framework that accounts for diminishing returns of repeated actions. - The view of submodular rewards is fresh. The hardness result is new and interesting. - The selected toy examples sound interesting and well-suited for the proposed framework.
- I do not see much contribution in positive results. Not only does the assumption sound strong from a practical perspective, but it seems quite contrived only for the sake of analysis. - Literature review: I agree with the motivation from diminishing returns, but a submodular reward design is not the only way to address that. For example, there is a blocking-bandit style framework that discourages repeated actions [1]. Maybe good to discuss why the submodular reward design is better. I also
Code & Models
Videos
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Machine Learning and Algorithms · Machine Learning and Data Classification
