Loading paper
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning | Tomesphere