BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

Tarjei Paule Hage; Markus J. Buehler

arXiv:2603.04124·cs.AI·March 5, 2026

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

Tarjei Paule Hage, Markus J. Buehler

PDF

Open Access 1 Models 2 Datasets

TL;DR

This paper investigates whether reinforcement learning with verifiable rewards can teach compact language models to reason about physics, revealing limitations in generalization and the need for structured reasoning scaffolding.

Contribution

It demonstrates that reinforcement learning with exact physics rewards leads to procedural templates rather than true understanding, highlighting a key limitation in current methods.

Findings

01

Model improves Pass@1 by 66.7% over base

02

Generalizes to more loads but fails with topological shifts

03

Optimization degrades robustness despite reward

Abstract

Can reinforcement learning with hard, verifiable rewards teach a compact language model to reason about physics, or does it primarily learn to pattern-match toward correct answers? We study this question by training a 1.5B-parameter reasoning model on beam statics, a classic engineering problem, using parameter-efficient RLVR with binary correctness rewards from symbolic solvers, without teacher-generated reasoning traces. The best BeamPERL checkpoint achieves a 66.7% improvement in Pass@1 over the base model. However, the learned competence is anisotropic: the model generalizes compositionally (more loads) but fails under topological shifts (moved supports) that require the same equilibrium equations. Intermediate checkpoints yield the strongest reasoning, while continued optimization degrades robustness while maintaining reward. These findings reveal a key limitation of outcome-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
lamm-mit/BeamPERL
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Model Reduction and Neural Networks · Quantum many-body systems