MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning

Yunze Lin

arXiv:2506.15706·cs.LG·June 23, 2025

MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning

Yunze Lin

PDF

Open Access

TL;DR

This paper introduces MDPO, a multi-granularity preference optimization method that significantly improves mathematical reasoning in LLMs by addressing limitations of existing DPO techniques across different reasoning levels.

Contribution

The paper proposes a novel multi-granularity approach to optimize LLMs for mathematical reasoning, unifying training objectives across solution, inference, and step levels to enhance correctness and computational ability.

Findings

01

Achieved 1.7% and 0.9% improvements on GSM8K with Qwen2 and Llama3.

02

Achieved 2.3% and 1.2% improvements on MATH dataset.

03

Outperformed existing DPO variants in experiments.

Abstract

Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) as it requires ensuring the correctness of each reasoning step. Researchers have been strengthening the mathematical reasoning abilities of LLMs through supervised fine-tuning, but due to the inability to suppress incorrect outputs, illusions can easily arise. Recently, Direct Preference Optimization (DPO) has been widely adopted for aligning human intent by using preference data to prevent LLMs from generating incorrect outputs. However, it has shown limited benefits in long-chain mathematical reasoning, mainly because DPO struggles to effectively capture the differences between accepted and rejected answers from preferences in long-chain data. The inconsistency between DPO training and LLMs' generation metrics also affects the effectiveness of suppressing incorrect outputs. We propose the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Topic Modeling · Multimodal Machine Learning Applications