Decomposition Methods with Deep Corrections for Reinforcement Learning
Maxime Bouton, Kyle Julian, Alireza Nakhaei, Kikuo Fujimura, and Mykel, J. Kochenderfer

TL;DR
This paper introduces a neural network-based correction method to enhance utility decomposition in reinforcement learning, improving solution quality in large-scale multi-entity decision problems like fisheries management and autonomous driving.
Contribution
It proposes a novel deep correction approach that refines approximate solutions from decomposition methods, addressing their independence assumptions and suboptimality.
Findings
Correction method significantly improves decomposition performance
Outperforms policies trained directly on full-scale problems
Effective in multi-entity scenarios like fisheries and autonomous driving
Abstract
Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research
