NRGPT: An Energy-based Alternative for GPT

Nima Dehmamy; Benjamin Hoover; Bishwajit Saha; Leo Kozachkov; Jean-Jacques Slotine; Dmitry Krotov

arXiv:2512.16762·cs.LG·May 4, 2026

NRGPT: An Energy-based Alternative for GPT

Nima Dehmamy, Benjamin Hoover, Bishwajit Saha, Leo Kozachkov, Jean-Jacques Slotine, Dmitry Krotov

PDF

1 Models 1 Video

TL;DR

NRGPT introduces an energy-based modification to GPT, viewing inference as exploration on an energy landscape, and demonstrates its effectiveness across various language tasks.

Contribution

It unifies GPT with energy-based models through a minimal modification, providing a new perspective on inference as energy landscape exploration.

Findings

01

NRGPT performs well on Shakespeare, ListOPS, and OpenWebText datasets.

02

The model can be interpreted as gradient descent on the energy landscape.

03

NRGPT shows increased resistance to overfitting during long training.

Abstract

Generative Pre-trained Transformer (GPT) architectures are the most popular design for language modeling. Energy-based modeling is a different paradigm that views inference as a dynamical process operating on an energy landscape. We propose a minimal modification of the GPT setting to unify it with the EBM framework. The inference step of our model, which we call eNeRgy-GPT (NRGPT), is conceptualized as an exploration of the tokens on the energy landscape. We prove, and verify empirically, that under certain circumstances this exploration becomes gradient descent, although they don't necessarily lead to the best performing models. We demonstrate that our model performs well for simple language (Shakespeare dataset), algebraic ListOPS tasks, and richer settings such as OpenWebText language modeling. We also observe that our models may be more resistant to overfitting, doing so only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
bsaha205/NRGPT-H-FF2W-128M-OWT
model· 276 dl
276 dl

Videos

NRGPT: An Energy-based Alternative for GPT· slideslive