# Batch Reinforcement Learning on the Industrial Benchmark: First   Experiences

**Authors:** Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas, A. Runkler, Volkmar Sterzing

arXiv: 1705.07262 · 2018-01-26

## TL;DR

This paper evaluates the Particle Swarm Optimization Policy (PSO-P) on the Industrial Benchmark, demonstrating its effectiveness and robustness in complex, realistic industrial RL scenarios compared to traditional methods.

## Contribution

It provides the first empirical assessment of PSO-P on a realistic industrial benchmark, showing its advantages over model-based and model-free RL methods.

## Key findings

- PSO-P achieved the best performance in the IB setting.
- PSO-P demonstrated robustness and low parameter tuning effort.
- Compared favorably to RCNN and NFQ in complex environments.

## Abstract

The Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting. To further investigate the properties and feasibility on real-world applications, this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement learning (RL) benchmark that aims at being realistic by including a variety of aspects found in industrial applications, like continuous state and action spaces, a high dimensional, partially observable state space, delayed effects, and complex stochasticity. The experimental results of PSO-P on IB are compared to results of closed-form control policies derived from the model-based Recurrent Control Neural Network (RCNN) and the model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not only of interest for academic benchmarks, but also for real-world industrial applications, since it also yielded the best performing policy in our IB setting. Compared to other well established RL techniques, PSO-P produced outstanding results in performance and robustness, requiring only a relatively low amount of effort in finding adequate parameters or making complex design decisions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.07262/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1705.07262/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1705.07262/full.md

---
Source: https://tomesphere.com/paper/1705.07262