Loading paper
What should post-training optimize? A test-time scaling law perspective | Tomesphere