Loading paper
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs | Tomesphere