Neuro-Symbolic Data Generation for Math Reasoning

Zenan Li; Zhi Zhou; Yuan Yao; Yu-Feng Li; Chun Cao; Fan Yang; Xian; Zhang; Xiaoxing Ma

arXiv:2412.04857·cs.AI·December 9, 2024

Neuro-Symbolic Data Generation for Math Reasoning

Zenan Li, Zhi Zhou, Yuan Yao, Yu-Feng Li, Chun Cao, Fan Yang, Xian, Zhang, Xiaoxing Ma

PDF

Open Access

TL;DR

This paper introduces a neuro-symbolic method for generating high-quality mathematical datasets to improve LLM reasoning, showing that realigning models with this data enhances their performance.

Contribution

The paper presents a novel neuro-symbolic framework for automated math data generation that improves LLM reasoning capabilities by augmenting training data.

Findings

01

Generated datasets are high-quality and diverse.

02

Realigned LLMs outperform state-of-the-art models.

03

Method effectively combines informalization and symbolic reasoning.

Abstract

A critical question about Large Language Models (LLMs) is whether their apparent deficiency in mathematical reasoning is inherent, or merely a result of insufficient exposure to high-quality mathematical data. To explore this, we developed an automated method for generating high-quality, supervised mathematical datasets. The method carefully mutates existing math problems, ensuring both diversity and validity of the newly generated problems. This is achieved by a neuro-symbolic data generation framework combining the intuitive informalization strengths of LLMs, and the precise symbolic reasoning of math solvers along with projected Markov chain Monte Carlo sampling in the highly-irregular symbolic space. Empirical experiments demonstrate the high quality of data generated by the proposed method, and that the LLMs, specifically LLaMA-2 and Mistral, when realigned with the generated data,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Fuzzy Logic and Control Systems · Evolutionary Algorithms and Applications