A mixed policy to improve performance of language models on math   problems

Gang Chen

arXiv:2307.08767·cs.CL·July 19, 2023

A mixed policy to improve performance of language models on math problems

Gang Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a mixed policy reinforcement learning approach with a two-level token exploration strategy to enhance the accuracy of language models on math problems, demonstrating over 2% performance improvement on GSM8K.

Contribution

It proposes a novel two-level token exploration policy combining probabilistic and deterministic methods for math problem solving in language models.

Findings

01

Achieved over 2% performance gain on GSM8K dataset.

02

Demonstrated effectiveness of mixed policy exploration in math reasoning.

03

Implemented a two-level token exploration strategy for improved accuracy.

Abstract

When to solve math problems, most language models take a sampling strategy to predict next word according conditional probabilities. In the math reasoning step, it may generate wrong answer. Considering math problems are deterministic, we propose a mixed policy exploration approach to solve math problems with reinforcement learning. In peculiar, we propose a two level token exploration policy: the abstract level explores next token with probability and the second level is deterministic. Specifically, the abstract level policy will decide whether the token is operator or operand with probability sampling, while the second level is deterministic to select next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than $2%$ performance gain. Our implementation is available at https://github.com/vividitytech/math_lm_rl.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vividitytech/math_lm_rl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning

MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Weight Decay · Discriminative Fine-Tuning · Residual Connection · Adam