Loading paper
Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization | Tomesphere