Training Verifiably Robust Agents Using Set-Based Reinforcement Learning
Manuel Wendl, Lukas Koller, Tobias Ladner, Matthias Althoff

TL;DR
This paper introduces a method to train reinforcement learning agents that are verifiably robust against input disturbances by using set-based training and reachability analysis, enhancing safety in critical applications.
Contribution
It extends formal verification techniques to reinforcement learning in continuous spaces, training neural networks on entire sets of perturbed inputs for improved robustness.
Findings
Verifiably more robust agents demonstrated across four benchmarks.
Set-based training improves worst-case reward performance.
Enhanced safety applicability in critical environments.
Abstract
Reinforcement learning often uses neural networks to solve complex control tasks. However, neural networks are sensitive to input perturbations, which makes their deployment in safety-critical environments challenging. This work lifts recent results from formally verifying neural networks against such disturbances to reinforcement learning in continuous state and action spaces using reachability analysis. While previous work mainly focuses on adversarial attacks for robust reinforcement learning, we train neural networks utilizing entire sets of perturbed inputs and maximize the worst-case reward. The obtained agents are verifiably more robust than agents obtained by related work, making them more applicable in safety-critical environments. This is demonstrated with an extensive empirical evaluation of four different benchmarks.
Peer Reviews
Decision·Submitted to ICLR 2026
- The work extends **set-based neural network training** to the reinforcement learning setting, introducing gradient sets into both actor and critic updates. - It offers an elegant synthesis of *verification-oriented training* and *policy optimization*, bridging robust RL and formal methods. - The proposed set-based regression loss and policy gradient are mathematically grounded in probabilistic reasoning, linking the *likelihood–prior decomposition* with set diameter minimization. - The r
1. **Theoretical limitations of over-approximation** - The approach relies on *outer approximations* of reachable sets, yet the paper provides no quantitative analysis of **set overgrowth or error bounds**. - This makes it unclear whether the final reachability sets meaningfully constrain the true behavior, especially over long horizons. 2. **Lack of computational analysis** - While the framework is elegant, there is no complexity or runtime discussion of *set propagation* or *gr
**Originality** - The work is an interesting and timely attempt to unify **formal verification and reinforcement learning**, using set-based propagation within the training loop. - The idea of **gradient sets** and integrating them with actor–critic architectures is novel and potentially impactful. **Quality** - The theoretical exposition (e.g., use of zonotopes, set propagation through affine and activation layers) is mostly sound. - The proposed framework connects well to existing ver
1. **Over-approximation accumulation** - The framework propagates set over-approximations step-by-step in an RL setting, but there is **no theoretical bound** on how the set diameter grows over time. Without such analysis, the reachability sets may become **too conservative** (i.e., trivial or uninformative), limiting practical use. 2. **Goal reachability and certification** - The paper uses over-approximated reachable sets to argue about goal attainment, but **overlap with the goal s
Significant novel contribution introducing first set-based RL algorithm bridging adversarial training and formal verification. Major theoretical innovation using gradient sets from entire perturbation sets. Results demonstrate up to 9x larger perturbation tolerance compared to state-of-the-art methods while maintaining formal verifiability across multiple verification frameworks (CORA, CROWN-Reach, JuliaReach, NVV). The paper tests across multiple established benchmarks (Navigation Task, 1D/2D Q
The Navigation Task incorporates safety through reward penalties rather than explicit constraints during training. The authors acknowledge that formal safety guarantees require separate verification beyond just reward lower bounds, limiting the direct safety claims. Evaluation focuses primarily on continuous control tasks with relatively low-dimensional state/action spaces. The approach's effectiveness on high-dimensional problems (e.g., image-based RL) or discrete action spaces remains unclear
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
