MiniGPT: Rebuilding GPT from First Principles

Jibin Joseph

arXiv:2605.17398·cs.CL·May 19, 2026

MiniGPT: Rebuilding GPT from First Principles

Jibin Joseph

PDF

TL;DR

MiniGPT is a from-scratch, PyTorch implementation of GPT-style language modeling that demonstrates how to build and train a character-level autoregressive model from first principles, emphasizing clarity and reproducibility.

Contribution

It provides a detailed, reproducible implementation of GPT components from scratch, serving as an educational resource and baseline for future research.

Findings

01

A 0.83M-parameter model achieves a validation loss of 1.7236.

02

A 10.77M-parameter model reaches a validation loss of 1.4780.

03

Generated text exhibits recognizable Shakespeare-style dialogue.

Abstract

This paper presents MiniGPT, a compact from-scratch implementation of GPT-style autoregressive language modeling in PyTorch. The aim is to rebuild the core GPT pipeline from first principles after studying the design of nanoGPT by Andrej Karpathy, while keeping the model and training code independently written in a single notebook. MiniGPT implements token and positional embeddings, causal multi-head self-attention, pre-LayerNorm Transformer blocks, residual connections, feed-forward MLP layers, next-token cross-entropy training (teacher forcing), validation tracking, checkpoint selection, and autoregressive text generation. This paper evaluates the implementation on Tiny Shakespeare dataset using character-level tokenization. A baseline 0.83M-parameter model reaches a validation loss of 1.7236 after 3000 training iterations. A stronger 10.77M-parameter configuration, using a larger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.