Decomposition Methods with Deep Corrections for Reinforcement Learning

Maxime Bouton; Kyle Julian; Alireza Nakhaei; Kikuo Fujimura; and Mykel; J. Kochenderfer

arXiv:1802.01772·cs.LG·April 24, 2019

Decomposition Methods with Deep Corrections for Reinforcement Learning

Maxime Bouton, Kyle Julian, Alireza Nakhaei, Kikuo Fujimura, and Mykel, J. Kochenderfer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural network-based correction method to enhance utility decomposition in reinforcement learning, improving solution quality in large-scale multi-entity decision problems like fisheries management and autonomous driving.

Contribution

It proposes a novel deep correction approach that refines approximate solutions from decomposition methods, addressing their independence assumptions and suboptimality.

Findings

01

Correction method significantly improves decomposition performance

02

Outperforms policies trained directly on full-scale problems

03

Effective in multi-entity scenarios like fisheries and autonomous driving

Abstract

Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sisl/AutomotivePOMDPs.jl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research