Loading paper
Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner | Tomesphere