Loading paper
Rewards as Labels: Revisiting RLVR from a Classification Perspective | Tomesphere