Loading paper
Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data | Tomesphere