ChatGLM-Math: Improving Math Problem-Solving in Large Language Models   with a Self-Critique Pipeline

Yifan Xu; Xiao Liu; Xinghan Liu; Zhenyu Hou; Yueyan Li; Xiaohan Zhang,; Zihan Wang; Aohan Zeng; Zhengxiao Du; Wenyi Zhao; Jie Tang; Yuxiao Dong

arXiv:2404.02893·cs.CL·April 4, 2024·1 cites

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Yifan Xu, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang,, Zihan Wang, Aohan Zeng, Zhengxiao Du, Wenyi Zhao, Jie Tang, Yuxiao Dong

PDF

Open Access 3 Repos

TL;DR

This paper introduces a Self-Critique pipeline that enhances large language models' mathematical problem-solving abilities while maintaining language skills, using a feedback-based training approach with a dedicated critique model.

Contribution

The work presents a novel Self-Critique pipeline that improves LLMs' math skills through self-generated feedback and fine-tuning, outperforming larger models.

Findings

01

Significant improvement in mathematical problem-solving accuracy.

02

Maintains or enhances language capabilities.

03

Outperforms larger baseline models.

Abstract

Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving. While many strategies and datasets to enhance LLMs' mathematics are developed, it remains a challenge to simultaneously maintain and improve both language and mathematical capabilities in deployed LLM systems.In this work, we tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment. We first train a general Math-Critique model from the LLM itself to provide feedback signals. Then, we sequentially employ rejective fine-tuning and direct preference optimization over the LLM's own generations for data collection. Based on ChatGLM3-32B, we conduct a series of experiments on both academic and our newly created challenging dataset, MathUserEval. Results show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques