MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu, Xiang Liu, Xiang, Geng, Jiajun Chen

TL;DR
This paper introduces MAPO, a framework that improves multilingual reasoning in large language models by aligning reasoning processes across languages using preference optimization, leading to significant performance gains and consistency.
Contribution
MAPO is the first to apply preference optimization with translation-based alignment to enhance multilingual reasoning in LLMs.
Findings
Achieved +16.2% on MSVAMP benchmark
Improved reasoning consistency across languages
Enhanced performance on multiple reasoning benchmarks
Abstract
Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages, which we adopt as the preference for optimization, e.g., Direct Preference Optimization (DPO) or Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models on all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗kevinpro/MetaMathOctopus-7Bmodel· 7 dl7 dl
- 🤗kevinpro/MathOctopus-MAPO-DPO-7Bmodel· 8 dl8 dl
- 🤗kevinpro/MetaMathOctopus-MAPO-DPO-7Bmodel· 6 dl6 dl
- 🤗kevinpro/MetaMathOctopus-13Bmodel· 9 dl9 dl
- 🤗kevinpro/MetaMathOctopus-MAPO-DPO-13Bmodel· 64 dl64 dl
- 🤗kevinpro/MathOctopus-MAPO-DPO-13Bmodel· 5 dl5 dl
- 🤗kevinpro/MistralMathOctopus-7Bmodel· 1.2k dl1.2k dl
- 🤗kevinpro/MistralMathOctopus-MAPO-DPO-7Bmodel· 22 dl22 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multi-Criteria Decision Making
MethodsALIGN · Entropy Regularization · Proximal Policy Optimization
