Loading paper
Rethinking Reward Signals in Video GRPO: When Scores Become Targets | Tomesphere