Loading paper
Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning | Tomesphere