Sample-efficient Neuro-symbolic Proximal Policy Optimization

Simone Murari; Celeste Veronese; Daniele Meli

arXiv:2604.25534·cs.AI·April 29, 2026

Sample-efficient Neuro-symbolic Proximal Policy Optimization

Simone Murari, Celeste Veronese, Daniele Meli

PDF

TL;DR

This paper introduces a neuro-symbolic extension to Proximal Policy Optimization that leverages symbolic guidance to improve learning efficiency in complex reinforcement learning tasks.

Contribution

It proposes two methods for integrating symbolic knowledge into PPO, enhancing data efficiency and performance in challenging environments.

Findings

01

Faster learning compared to standard PPO.

02

Higher returns at convergence across benchmarks.

03

Effective even with imperfect symbolic knowledge.

Abstract

Deep Reinforcement Learning (DRL) algorithms often require a large amount of data and struggle in sparse-reward domains with long planning horizons and multiple sub-goals. In this paper, we propose a neuro-symbolic extension of Proximal Policy Optimization (PPO) that transfers partial logical policy specifications learned in easier instances to guide learning in more challenging settings. We introduce two integrations of symbolic guidance: (i) H-PPO-Product, which biases the action distribution at sampling time, and (ii) H-PPO-SymLoss, which augments the PPO loss with a symbolic regularization term. We evaluate our methods on three benchmarks (OfficeWorld, WaterWorld, and DoorKey), showing consistently faster learning and higher return at convergence than PPO and a Reward Machine baseline, also under imperfect symbolic knowledge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.