Loading paper
Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training | Tomesphere