Loading paper
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion | Tomesphere