MAPO: Advancing Multilingual Reasoning through Multilingual   Alignment-as-Preference Optimization

Shuaijie She; Wei Zou; Shujian Huang; Wenhao Zhu; Xiang Liu; Xiang; Geng; Jiajun Chen

arXiv:2401.06838·cs.CL·April 16, 2024·1 cites

MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu, Xiang Liu, Xiang, Geng, Jiajun Chen

PDF

Open Access 1 Repo 8 Models

TL;DR

This paper introduces MAPO, a framework that improves multilingual reasoning in large language models by aligning reasoning processes across languages using preference optimization, leading to significant performance gains and consistency.

Contribution

MAPO is the first to apply preference optimization with translation-based alignment to enhance multilingual reasoning in LLMs.

Findings

01

Achieved +16.2% on MSVAMP benchmark

02

Improved reasoning consistency across languages

03

Enhanced performance on multiple reasoning benchmarks

Abstract

Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages, which we adopt as the preference for optimization, e.g., Direct Preference Optimization (DPO) or Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models on all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

njunlp/mapo
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multi-Criteria Decision Making

MethodsALIGN · Entropy Regularization · Proximal Policy Optimization