PersonaMath: Boosting Mathematical Reasoning via Persona-Driven Data   Augmentation

Jing Luo; Longze Chen; Run Luo; Liang Zhu; Chang Ao; Jiaming Li; Yukun; Chen; Xin Cheng; Wen Yang; Jiayuan Su; Ahmadreza Argha; Hamid Alinejad-Rokny,; Chengming Li; Shiwen Ni; Min Yang

arXiv:2410.01504·cs.CL·February 24, 2025

PersonaMath: Boosting Mathematical Reasoning via Persona-Driven Data Augmentation

Jing Luo, Longze Chen, Run Luo, Liang Zhu, Chang Ao, Jiaming Li, Yukun, Chen, Xin Cheng, Wen Yang, Jiayuan Su, Ahmadreza Argha, Hamid Alinejad-Rokny,, Chengming Li, Shiwen Ni, Min Yang

PDF

Open Access 1 Datasets

TL;DR

PersonaMath introduces a persona-driven data augmentation method and a new dataset to significantly improve open-source LLMs' mathematical reasoning, achieving state-of-the-art results with less data.

Contribution

The paper presents a novel persona-driven data augmentation technique and a new dataset, PersonaMathQA, to enhance mathematical reasoning in open-source LLMs.

Findings

01

PersonaMath-7B achieves 61.2% accuracy on MATH

02

Outperforms baselines with less data

03

High dataset quality and diversity

Abstract

While closed-source Large Language Models (LLMs) demonstrate strong mathematical problem-solving abilities, open-source models still face challenges with such tasks. To bridge this gap, we propose a data augmentation approach and introduce PersonaMathQA, a dataset derived from MATH and GSM8K, on which we train the PersonaMath models. Our approach consists of two stages: the first stage focuses on learning from Persona Diversification, and the second stage emphasizes learning from Reflection. In the first stage, we regenerate detailed chain-of-thought (CoT) solutions as instructions using a closed-source LLM and introduce a persona-driven data augmentation technique. This technique innovatively classifies personas based on occupations, significantly enhancing the dataset's diversity and quality. In the second stage, we incorporate reflection to fully leverage more challenging and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

jingluo/PersonaMathQA
dataset· 11 dl
11 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersona Design and Applications · Human-Automation Interaction and Safety · Context-Aware Activity Recognition Systems