Loading paper
MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning | Tomesphere