Oracle-Guided Soft Shielding for Safe Move Prediction in Chess
Prajit T Rajendran, Fabio Arnez, Huascar Espinoza, Agnes Delaborde, Chokri Mraidha

TL;DR
This paper introduces Oracle-Guided Soft Shielding (OGSS), a framework that enhances safe decision-making in chess by combining probabilistic safety models with imitation learning, reducing tactical errors during exploration.
Contribution
The paper presents OGSS, a novel safety mechanism that integrates oracle feedback into imitation learning to improve safety and exploration in chess AI.
Findings
OGSS reduces blunder rates during exploration.
OGSS maintains safety while increasing exploration ratio.
OGSS outperforms existing safety methods in chess experiments.
Abstract
In high stakes environments, agents relying purely on imitation learning or reinforcement learning often struggle to avoid safety-critical errors during exploration. Existing reinforcement learning approaches for environments such as chess require hundreds of thousands of episodes and substantial computational resources to converge. Imitation learning, on the other hand, is more sample efficient but is brittle under distributional shift and lacks mechanisms for proactive risk avoidance. In this work, we propose Oracle-Guided Soft Shielding (OGSS), a simple yet effective framework for safer decision-making, enabling safe exploration by learning a probabilistic safety model from oracle feedback in an imitation learning setting. Focusing on the domain of chess, we train a model to predict strong moves based on past games, and separately learn a blunder prediction model from Stockfish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Robot Manipulation and Learning
