Loading paper
Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction | Tomesphere