LLMs as High-Dimensional Nonlinear Autoregressive Models with Attention: Training, Alignment and Inference

Vikram Krishnamurthy

arXiv:2602.00426·cs.LG·February 3, 2026

LLMs as High-Dimensional Nonlinear Autoregressive Models with Attention: Training, Alignment and Inference

Vikram Krishnamurthy

PDF

Open Access

TL;DR

This paper presents a mathematical framework modeling large language models as high-dimensional nonlinear autoregressive systems with attention, clarifying their training, alignment, and inference processes for better understanding and analysis.

Contribution

It introduces an explicit equation-level formulation of LLMs as autoregressive models with attention, unifying various training and inference techniques under a common mathematical framework.

Findings

01

Self-attention as bilinear-softmax-linear composition

02

Analysis of alignment behaviors like sycophancy

03

Insights into inference phenomena such as hallucination and in-context learning

Abstract

Large language models (LLMs) based on transformer architectures are typically described through collections of architectural components and training procedures, obscuring their underlying computational structure. This review article provides a concise mathematical reference for researchers seeking an explicit, equation-level description of LLM training, alignment, and generation. We formulate LLMs as high-dimensional nonlinear autoregressive models with attention-based dependencies. The framework encompasses pretraining via next-token prediction, alignment methods such as reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), rejection sampling fine-tuning (RSFT), and reinforcement learning from verifiable rewards (RLVR), as well as autoregressive generation during inference. Self-attention emerges naturally as a repeated bilinear--softmax--linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Ferroelectric and Negative Capacitance Devices · Natural Language Processing Techniques