ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation
Abhijeet Pendyala, Justin Dettmer, Tobias Glasmachers, Asma Atamna

TL;DR
ContainerGym is a new benchmark derived from a real-world industrial resource allocation problem, designed to evaluate reinforcement learning algorithms on complex, realistic decision-making tasks with varying difficulty levels.
Contribution
It introduces a versatile, real-world inspired benchmark for reinforcement learning that captures practical challenges and enables thorough evaluation of algorithms like PPO, TRPO, and DQN.
Findings
Baseline results show limitations of PPO, TRPO, and DQN on the benchmark.
The benchmark can be configured for different problem complexities.
Statistical analysis highlights specific weaknesses of existing algorithms.
Abstract
We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Scheduling and Optimization Algorithms
MethodsConvolution · Q-Learning · Dense Connections · Entropy Regularization · Deep Q-Network · Trust Region Policy Optimization · Proximal Policy Optimization
