Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning   with Human-AI Feedback

Avinash Anand; Kritarth Prasad; Chhavi Kirtani; Ashwin R Nair; Mohit; Gupta; Saloni Garg; Anurag Gautam; Snehal Buldeo; Rajiv Ratn Shah

arXiv:2412.06827·cs.LG·December 11, 2024

Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback

Avinash Anand, Kritarth Prasad, Chhavi Kirtani, Ashwin R Nair, Mohit, Gupta, Saloni Garg, Anurag Gautam, Snehal Buldeo, Rajiv Ratn Shah

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning approach with human and AI feedback to significantly improve large language models' reasoning and accuracy on physics problems, especially in complex arithmetic and conceptual understanding.

Contribution

It presents a novel RL-based training method, RLHAIF, that enhances LLMs' physics reasoning capabilities beyond existing prompt engineering techniques.

Findings

01

RLHAIF improves LLM performance on physics questions.

02

MISTRAL-PPO achieves high METEOR and Reasoning scores.

03

Reinforcement learning with human-AI feedback outperforms baseline models.

Abstract

Large Language Models (LLMs) have demonstrated strong capabilities in text-based tasks but struggle with the complex reasoning required for physics problems, particularly in advanced arithmetic and conceptual understanding. While some research has explored ways to enhance LLMs in physics education using techniques such as prompt engineering and Retrieval Augmentation Generation (RAG), not enough effort has been made in addressing their limitations in physics reasoning. This paper presents a novel approach to improving LLM performance on physics questions using Reinforcement Learning with Human and Artificial Intelligence Feedback (RLHAIF). We evaluate several reinforcement learning methods, including Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and Remax optimization. These methods are chosen to investigate RL policy performance with different settings on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications