Solving a Real-World Optimization Problem Using Proximal Policy   Optimization with Curriculum Learning and Reward Engineering

Abhijeet Pendyala; Asma Atamna; Tobias Glasmachers

arXiv:2404.02577·cs.LG·July 24, 2024·1 cites

Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering

Abhijeet Pendyala, Asma Atamna, Tobias Glasmachers

PDF

Open Access

TL;DR

This paper introduces a curriculum learning-enhanced proximal policy optimization method with reward engineering to effectively train an agent for complex, real-world waste sorting, balancing safety, efficiency, and resource use.

Contribution

It presents a novel five-stage curriculum learning approach combined with reward engineering to train RL agents for complex industrial tasks with delayed rewards and class imbalance.

Findings

01

Achieved near-zero safety violations during inference

02

Significantly improved waste sorting efficiency

03

Demonstrated robustness in complex, real-world environment

Abstract

We present a proximal policy optimization (PPO) agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of effectively balancing the competing objectives of operational safety, volume optimization, and minimizing resource usage. A vanilla agent trained from scratch on these multiple criteria fails to solve the problem due to its inherent complexities. This problem is particularly difficult due to the environment's extremely delayed rewards with long time horizons and class (or action) imbalance, with important actions being infrequent in the optimal policy. This forces the agent to anticipate long-term action consequences and prioritize rare but rewarding behaviours, creating a non-trivial reinforcement learning task. Our five-stage CL approach tackles…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Spreadsheets and End-User Computing