Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback
Avinash Anand, Kritarth Prasad, Chhavi Kirtani, Ashwin R Nair, Mohit, Gupta, Saloni Garg, Anurag Gautam, Snehal Buldeo, Rajiv Ratn Shah

TL;DR
This paper introduces a reinforcement learning approach with human and AI feedback to significantly improve large language models' reasoning and accuracy on physics problems, especially in complex arithmetic and conceptual understanding.
Contribution
It presents a novel RL-based training method, RLHAIF, that enhances LLMs' physics reasoning capabilities beyond existing prompt engineering techniques.
Findings
RLHAIF improves LLM performance on physics questions.
MISTRAL-PPO achieves high METEOR and Reasoning scores.
Reinforcement learning with human-AI feedback outperforms baseline models.
Abstract
Large Language Models (LLMs) have demonstrated strong capabilities in text-based tasks but struggle with the complex reasoning required for physics problems, particularly in advanced arithmetic and conceptual understanding. While some research has explored ways to enhance LLMs in physics education using techniques such as prompt engineering and Retrieval Augmentation Generation (RAG), not enough effort has been made in addressing their limitations in physics reasoning. This paper presents a novel approach to improving LLM performance on physics questions using Reinforcement Learning with Human and Artificial Intelligence Feedback (RLHAIF). We evaluate several reinforcement learning methods, including Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and Remax optimization. These methods are chosen to investigate RL policy performance with different settings on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
