Loading paper
Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision | Tomesphere