Loading paper
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL | Tomesphere