BlendRL: A Framework for Merging Symbolic and Neural Policy Learning

Hikaru Shindo; Quentin Delfosse; Devendra Singh Dhami; Kristian; Kersting

arXiv:2410.11689·cs.LG·April 22, 2025

BlendRL: A Framework for Merging Symbolic and Neural Policy Learning

Hikaru Shindo, Quentin Delfosse, Devendra Singh Dhami, Kristian, Kersting

PDF

Open Access 1 Video 3 Reviews

TL;DR

BlendRL is a neuro-symbolic reinforcement learning framework that combines neural and symbolic policies, leading to improved performance and robustness in Atari games compared to purely neural or symbolic approaches.

Contribution

This work introduces BlendRL, a novel framework that integrates symbolic reasoning with neural policies within reinforcement learning agents.

Findings

01

BlendRL outperforms neural and symbolic baselines in Atari environments.

02

BlendRL agents demonstrate robustness to environmental changes.

03

Analysis shows hybrid policies help overcome individual limitations.

Abstract

Humans can leverage both symbolic reasoning and intuitive reactions. In contrast, reinforcement learning policies are typically encoded in either opaque systems like neural networks or symbolic systems that rely on predefined symbols and rules. This disjointed approach severely limits the agents' capabilities, as they often lack either the flexible low-level reaction characteristic of neural agents or the interpretable reasoning of symbolic agents. To overcome this challenge, we introduce BlendRL, a neuro-symbolic RL framework that harmoniously integrates both paradigms within RL agents that use mixtures of both logic and neural policies. We empirically demonstrate that BlendRL agents outperform both neural and symbolic baselines in standard Atari environments, and showcase their robustness to environmental changes. Additionally, we analyze the interaction between neural and symbolic…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 5Confidence 4

Strengths

1. The paper is clearly written and easy to follow. 2. The utilization of the language model shows a certain level of innovation.

Weaknesses

1. The overall concept is not particularly novel, with numerous similar works, such as the well-known fast and slow systems, already existing. 2. The paper and appendix lack crucial details on how the LLM generates rules and calculates hybrid probabilities, and to what extent this is based on the content provided in the prompts. This is essential for determining whether the method can generalize to more diverse tasks. 3. The paper consistently emphasizes complex tasks, yet the experimental envir

Reviewer 02Rating 8Confidence 4

Strengths

The paper is very well written and easy to follow. The proposed framework is quite novel to the best of my knowdledge, since the level 2 and 1 systems are usually much more separated than in BlendRL, and the results against vanilla PPO or a symbolic approach (NUDGE) are promising.

Weaknesses

My biggest concern with this work is the lack of empirical comparison with any other neuro-symbolic baselines, e.g. [1-4] Since BlendRL also weights heavily on object-based representations and relational learning it would have been good (although I don't consider this critical) to include some contrast with deep learning approaches for such kind of learning, e.g. [1, 5-6]. While the point above will raise my confidence on the paper, I still thinkt hat the novelty of the approach and the resu

Reviewer 03Rating 8Confidence 4

Strengths

The idea of learning a mixture of neural policy and symbolic policy is novel. Given the presented experimental results, it seems to lead to nicely interpretable decision rules (at least in the symbolic part). Also, surprisingly to me, it seems that it automatically learns to make "reflex" decisions preferably using the neural policy. The overall proposition nicely integrates several previous techniques to offer an end-to-end method, with little human inputs.

Weaknesses

The paper should be more self-contained. For instance, this work builds on several existing propositions (notably by Shindo et al. and Delfosse et al.). The authors should recall and provide more details of those previous works to help the reader more easily understand BlendRL and appreciate its novelty. For instance, I think the current presentation of differentable forward reasoner is too light, e.g., how is the set \mathcal C related to the forward reasoning graph? What's the advantages/disad

Videos

BlendRL: A Framework for Merging Symbolic and Neural Policy Learning· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics