Loading paper
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game | Tomesphere