Loading paper
EBPO: Empirical Bayes Shrinkage for Stabilizing Group-Relative Policy Optimization | Tomesphere