
TL;DR
MiniGPT is a from-scratch, PyTorch implementation of GPT-style language modeling that demonstrates how to build and train a character-level autoregressive model from first principles, emphasizing clarity and reproducibility.
Contribution
It provides a detailed, reproducible implementation of GPT components from scratch, serving as an educational resource and baseline for future research.
Findings
A 0.83M-parameter model achieves a validation loss of 1.7236.
A 10.77M-parameter model reaches a validation loss of 1.4780.
Generated text exhibits recognizable Shakespeare-style dialogue.
Abstract
This paper presents MiniGPT, a compact from-scratch implementation of GPT-style autoregressive language modeling in PyTorch. The aim is to rebuild the core GPT pipeline from first principles after studying the design of nanoGPT by Andrej Karpathy, while keeping the model and training code independently written in a single notebook. MiniGPT implements token and positional embeddings, causal multi-head self-attention, pre-LayerNorm Transformer blocks, residual connections, feed-forward MLP layers, next-token cross-entropy training (teacher forcing), validation tracking, checkpoint selection, and autoregressive text generation. This paper evaluates the implementation on Tiny Shakespeare dataset using character-level tokenization. A baseline 0.83M-parameter model reaches a validation loss of 1.7236 after 3000 training iterations. A stronger 10.77M-parameter configuration, using a larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
