Sample-efficient Neuro-symbolic Proximal Policy Optimization
Simone Murari, Celeste Veronese, Daniele Meli

TL;DR
This paper introduces a neuro-symbolic extension to Proximal Policy Optimization that leverages symbolic guidance to improve learning efficiency in complex reinforcement learning tasks.
Contribution
It proposes two methods for integrating symbolic knowledge into PPO, enhancing data efficiency and performance in challenging environments.
Findings
Faster learning compared to standard PPO.
Higher returns at convergence across benchmarks.
Effective even with imperfect symbolic knowledge.
Abstract
Deep Reinforcement Learning (DRL) algorithms often require a large amount of data and struggle in sparse-reward domains with long planning horizons and multiple sub-goals. In this paper, we propose a neuro-symbolic extension of Proximal Policy Optimization (PPO) that transfers partial logical policy specifications learned in easier instances to guide learning in more challenging settings. We introduce two integrations of symbolic guidance: (i) H-PPO-Product, which biases the action distribution at sampling time, and (ii) H-PPO-SymLoss, which augments the PPO loss with a symbolic regularization term. We evaluate our methods on three benchmarks (OfficeWorld, WaterWorld, and DoorKey), showing consistently faster learning and higher return at convergence than PPO and a Reward Machine baseline, also under imperfect symbolic knowledge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
