Loading paper
Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models | Tomesphere