Loading paper
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization | Tomesphere