Loading paper
Reward Shaping for Inference-Time Alignment: A Stackelberg Game Perspective | Tomesphere