System III: Learning with Domain Knowledge for Safety Constraints
Fazl Barez, Hosien Hasanbieg, Alesandro Abbate

TL;DR
System III introduces a framework that incorporates domain knowledge expressed as first-order logic to guide safe exploration in reinforcement learning, improving safety and sample efficiency in safety-critical environments.
Contribution
The paper presents a novel approach that uses first-order logic to encode safety constraints and evaluates their satisfaction via p-norms, enhancing safe exploration in reinforcement learning.
Findings
Safer exploration in all tested environments
Improved sample efficiency over baseline methods
Effective constraint satisfaction in safety-critical tasks
Abstract
Reinforcement learning agents naturally learn from extensive exploration. Exploration is costly and can be unsafe in domains. This paper proposes a novel framework for incorporating domain knowledge to help guide safe exploration and boost sample efficiency. Previous approaches impose constraints, such as regularisation parameters in neural networks, that rely on large sample sets and often are not suitable for safety-critical domains where agents should almost always avoid unsafe actions. In our approach, called , which is inspired by psychologists' notions of the brain's and , we represent domain expert knowledge of safety in form of first-order logic. We evaluate the satisfaction of these constraints via p-norms in state vector space. In our formulation, constraints are analogous to hazards,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
