Assessing Policy, Loss and Planning Combinations in Reinforcement Learning using a New Modular Architecture
Tiago Gaspar Oliveira, Arlindo L. Oliveira

TL;DR
This paper introduces a modular software architecture for model-based reinforcement learning, enabling flexible combination of planning algorithms, policies, and loss functions, demonstrated across three diverse environments.
Contribution
A novel modular architecture and set of reusable building blocks for constructing and testing model-based reinforcement learning agents.
Findings
The architecture facilitates problem-specific customization.
The averaged minimax planning algorithm performs well across environments.
Optimal component combinations are highly environment-dependent.
Abstract
The model-based reinforcement learning paradigm, which uses planning algorithms and neural network models, has recently achieved unprecedented results in diverse applications, leading to what is now known as deep reinforcement learning. These agents are quite complex and involve multiple components, factors that can create challenges for research. In this work, we propose a new modular software architecture suited for these types of agents, and a set of building blocks that can be easily reused and assembled to construct new model-based reinforcement learning agents. These building blocks include planning algorithms, policies, and loss functions. We illustrate the use of this architecture by combining several of these building blocks to implement and test agents that are optimized to three different test environments: Cartpole, Minigrid, and Tictactoe. One particular planning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Software Engineering Methodologies · Reinforcement Learning in Robotics
