Adversarial Stress Testing of SPARK Humanoid Safety Filters
Saurav Ghosh, Abdou Sow, and Luke Zhang

TL;DR
This paper evaluates the robustness of SPARK humanoid safety filters through replication and stress testing in simulated environments, revealing how safety behaviors change under challenging conditions.
Contribution
It introduces a systematic stress testing framework for humanoid safety filters and provides insights into their failure modes beyond nominal benchmarks.
Findings
Some safety methods track goals better; others reduce collision steps.
Safety behavior varies with obstacle crowding, noise, and delays.
Humanoid safety evaluation should include stress tests to reveal failure modes.
Abstract
Humanoid robots are difficult to deploy safely because they have high-dimensional bodies, many collision constraints, and must operate near people and obstacles. Safety filters help by modifying a nominal control action when it may violate collision-avoidance constraints. Still, nominal benchmark scores do not fully show how these filters behave in harder environments. In this work, we study the robustness of SPARK humanoid safety filters through replication and stress testing. We replicate the SPARK benchmark case G1SportMode_D1_WG_SO_v1 in MuJoCo and evaluate RSSA, RSSS, SSA, CBF, PFM, and SMA under controlled random seeds. We also built a post-processing pipeline that converts raw SPARK logs into goal-tracking, minimum-distance, and collision-step metrics. Our results show that some methods track the goal more closely, while others reduce collision steps more effectively. The stress…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
