Loading paper
Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training | Tomesphere