Simion Zoo: A Workbench for Distributed Experimentation with   Reinforcement Learning for Continuous Control Tasks

Borja Fernandez-Gauna; Manuel Gra\~na; Roland S. Zimmermann

arXiv:1904.07817·cs.LG·April 17, 2019

Simion Zoo: A Workbench for Distributed Experimentation with Reinforcement Learning for Continuous Control Tasks

Borja Fernandez-Gauna, Manuel Gra\~na, Roland S. Zimmermann

PDF

Open Access

TL;DR

Simion Zoo is a comprehensive RL workbench with an easy GUI, support for distributed execution including GPUs, and tools for exploring RL hyperparameters to facilitate continuous control task experimentation.

Contribution

It introduces a user-friendly RL experimentation platform supporting distributed computing and hyperparameter exploration, enhancing research efficiency.

Findings

01

Supports distributed RL experiments with GPUs

02

Enables concurrent hyperparameter tuning

03

Provides statistical and visual analysis tools

Abstract

We present Simion Zoo, a Reinforcement Learning (RL) workbench that provides a complete set of tools to design, run, and analyze the results,both statistically and visually, of RL control applications. The main features that set apart Simion Zoo from similar software packages are its easy-to-use GUI, its support for distributed execution including deployment over graphics processing units (GPUs) , and the possibility to explore concurrently the RL metaparameter space, which is key to successful RL experimentation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Control Systems Optimization

Full text

Simion Zoo: A Workbench for Distributed Experimentation with Reinforcement

Learning for Continuous Control Tasks

Borja Fernandez-Gauna, Manuel Graña and Roland S. Zimmermann

Abstract

We present Simion Zoo, a Reinforcement Learning (RL) workbench that provides a complete set of tools to design, run, and analyze the results,both statistically and visually, of RL control applications. The main features that set apart Simion Zoo from similar software packages are its easy-to-use GUI, its support for distributed execution including deployment over graphics processing units (GPUs) , and the possibility to explore concurrently the RL metaparameter space , which is key to successful RL experimentation.

1 Introduction

In recent years, Reinforcement Learning (RL) has become a very popular area of research, because of the almost exponential increase in computing power due to the advent of dedicated GPUs that have empowered researchers to face previously unaffordable problems. In particular, the successful applications of Deep Reinforcement Learning (DRL)to produce master videogame players [10, 7] have created great expectations about the potential of DRL, even outside the academic research community. As a result of this popularity boost, the number of RL software packages has grown significantly. Nevertheless, these projects are mostly oriented towards the research community, i.e. they assume sophisticated programming users with powerful computing resources to run the software. Even for sophisticated programmers, these packages impose a steep learning curve that hinders their user experience. This is in stark contrast with the de-facto user standards forSupervised Learning (SL) software, which customarily allow users to design/run experiments, and to analyze the results on an intuitive Graphical User Interface (GUI) that allows a swift learning curve. Users without programming skills that intend to design and run RL experiments quickly on inexpensive and commonly available hardware will obviously appreciate such kind of facilities. Let use numerate the most important features that a user-friendly tool should possess: (a) easy installation, (b) a GUI usable by non-programmers, (c) graphical visualization of the experiment, and relevant data structures, which helps to understand the performance achieved by the learning process, (d) facilities for statistical analysis of the results, (e)concurrent exploration of metaparameter space shortening the trial-and-error cycle, (f) efficient use of heterogeneous computing resources, and (g) easy application to a broad range of RL problems, algorithms and controllers.

The paper is structured as follows: first, Section 2 presents the Simion Zoo workbench, Section 3 briefly reviews related works, and Section 4 discusses the availability of the project documentation and its requirements.

2 Simion Zoo Workbench

Simion Zoo [4] is a workbench to testRL and DRL algorithms that focuses on model-free RL learning algorithms applied to control problems defined on continuous state and action spaces. This software was designed to fulfill the requirements enumerated in Section 1. (a) The installation process is straight-forward. One single installer is provided to install the Herd Agent service/daemon on the slave machines. On the master computer, the user needs only to run Badger (the main application) which requires no installation and bundles all the dependencies. (b) The configuration, execution and analysis of results of the RL experiments is done via a user-friendly GUI. (c) Experiments can be run either locally in the master computer or distributed over the slave computers. Locally run experiments can be visualized live, whereas remotely executed experiments are monitorized live (showing the average episode rewards) but can also be visualized off-line once finished. (d) Finished experiments can be further analyzed with a provided tool that generates publication-quality plots and statistics of the system variables. (e) Parameters can be forked and given as many values as desired, so that experiments with all the parameter value combinations are run concurrently. (f) Each slave machine receives binaries which are compatible with its own operating system and architecture, taking advantage of all the resources available on the computer (all the CPU cores and/or the GPU). The project currently supports Windows and Linux operating systems. (g) The workbench features a wide set of built-in environments and agents.

The user can use conventional controllers (Proportional-Integrative-Derivative, Linear-Quadratic Regulator, Variable-Speed Wind-Turbine (VSWT)), specific controllers [3], Q-function learning algorithms (SARSA, Q-Learning, and Double Q-Learning [6]), Actor-Critic algorithms (CACLA, regular gradient ascent, Incremental Natural Actor-Critic [1], Off-Policy Actor-Critic [2], and Off-Policy Deterministic Actor-Critic [8]), and Deep RL methods (DQN, Double-DQN and DDPG). Besides, policy learners (Actors) can be combined with value function learners (Critics): Temporal-Difference $\left(\lambda\right)$ , TDC $\left(\lambda\right)$ , and True Online Temporal-Difference [9].

The workbench offers a broad set of built-in simulation environments: classical benchmarking control tasks (mountain-car, balancing pole, swing-up pendulum, and double swing-up pendulum), some benchmark tasks from [5] (underwater vehicle control and airplane pitch control), several single and multi-robot control problems that use the Bullet Physics library, and two VSWT models (a two-mass model, and OpenFAST111https://nwtc.nrel.gov/FAST, which is considered the state-of-the-art of Wind Turbine simulation).

2.1 Using Simion Zoo

The main application is Badger, which offers a three-phase experimentation process pipeline: design, monitor and analysis. Each phase has a dedicated tab within the Badger’sGUI. In the Editor tab, the user selects the agent type,the simulation environment, and sets the value/options of their parameters. The learning and simulation parameters, aka metaparameters, are organized in a hierarchy, they change dynamically depending on the user choices, and they can be given several values. Once the experiments are designed (we may design and run several experiments concurrently), the user can press the Launch button generating a set of experimental instances units, each corresponding to a combination of metaparameter values , and switch afterwards to the Monitor tab. This tab shows on its left part a list of the available agents and their capabilities, allowing the user to select all or a subset of them for the next experiment. Once started, the progress of each experimental unit is shown, allowing the user early experiment cancellation if the learning performance is not as good as expected. Finally, the user may analyze the learning and simulation results in the Reports tab, selecting a subset of the logged variables, grouping experimental units by parameter value, and visualizing the experiment222The user guide can be accessed online: https://github.com/simionsoft/SimionZoo/wiki/User-guide. Most remarkably, SimionZoo generates customizable plots ready for publication such as those published in [3].

2.2 Extending Simion Zoo

Extension or modification of the source code does not require to change the GUI. The source code is automatically parsed after compilation to generate the object class definitions and their parameters so that the GUI automatically adapts to the latest version of the code333The developer guide can be accessed online: https://github.com/simionsoft/SimionZoo/wiki/Developer-guide.

3 Related Work

There are several programming libraries that provide RL-related functionalities, i.e. Deep Neural Network libraries such as Tensor Flow444https://www.tensorflow.org/, Caffe555http://caffe.berkeleyvision.org/,* *or Microsoft Cognitive Toolkit666https://www.microsoft.com/en-us/cognitive-toolkit/. These libraries provide the bottom layer to buildDRL algorithms.. Above in the hierarchy are RL libraries that provide algorithm implementations and environments but no GUIs or a configurable executable. Some of the most popular are pyBrain 777http://www.pybrain.org/, RL Park888http://rlpark.github.io/site, RLLib999https://github.com/samindaa/RLLib, and RL Library101010http://library.rl-community.org. Closest to our RL workbench software, we note two RL simulation environments that offer a full GUI to edit, run and view/analyze RL experiments: Maja Machine Learning Framework111111http://mmlf.sourceforge.net (MMLF) and RL Sim121212https://www.cs.cmu.edu/~awm/rlsim/. The former does not support distributed executions, GPUs or multi-thread execution, and seems to have been abandoned. The latter is a very simple educational tool with only one configurable grid world environment and some of the most typical tabular RL algorithms.

4 Documentation, Licensing and Availability

SiminZoo has been published as open source in Zenodo [4].Contributions to the project’s source code can be made through our public Github repository (https://github.com/simionsoft/SimionZoo), where the reader can also find the documentation (https://github.com/simionsoft/SimionZoo/wiki) and pre-compiled Windows and Linux binaries for the end-user (https://github.com/borjafdezgauna/SimionZoo/releases). Simion Zoo currently works under Windows and Linux, and is licensed under an MIT license.

5 Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 777720. The work reported in this paper has been partially supported by FEDER funds for the MINECO project TIN2017-85827-P, and projects KK-2018/00071 and KK-2018/00082 of the Elkartek 2018 funding program of the Basque Government. We would like to thank Unai Tercero, Asier Rodriguez-Gonzalez and José-Alejandro Guerra-Denis for their contributions to the project.

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. Degris, P. M. Pilarski, and R. S. Sutton. Model-free reinforcement learning with continuous action in practice. In 2012 American Control Conference (ACC) , pages 2177–2182, June 2012.
2[2] Thomas Degris, Martha White, and Richard S. Sutton. Off-policy actor-critic. Co RR , abs/1205.4839, 2012.
3[3] Borja Fernández-Gauna, Unai Fernandez-Gamiz, and Manuel Graña. Variable speed wind turbine controller adaptation by reinforcement learning. Integrated Computer-Aided Engineering , 24(1):27–39, 2017.
4[4] Borja Fernandez Gauna, Manuel Graña, and Roland S Zimmermann. Simion Zoo: a software bundle for Reinforcement Learning applications. https://doi.org/10.5281/zenodo.2579013, February 2019.
5[5] Roland Hafner and Martin Riedmiller. Reinforcement learning in feedback control: Challenges and benchmarks from technical process control. Machine Learning , 84(1-2):137–169, 2011.
6[6] Hado V. Hasselt. Double q-learning. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23 , pages 2613–2621. Curran Associates, Inc., 2010.
7[7] David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature , 529:484–503, 2016.
8[8] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 , ICML’14, pages I–387–I–395. JMLR.org, 2014.