Loading paper
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning | Tomesphere