A Survey of Large Language Models

Wayne Xin Zhao; Kun Zhou; Junyi Li; Tianyi Tang; Xiaolei Wang; Yupeng Hou; Yingqian Min; Beichen Zhang; Junjie Zhang; Zican Dong; Yifan Du; Chen Yang; Yushuo Chen; Zhipeng Chen; Jinhao Jiang; Ruiyang Ren; Yifan Li; Xinyu Tang; Zikang Liu; Peiyu Liu; Jian-Yun Nie; Ji-Rong Wen

arXiv:2303.18223·cs.CL·March 19, 2026·1.4k cites

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen

PDF

Open Access 5 Repos 1 Models

TL;DR

This survey reviews recent advances in large language models, highlighting their development, techniques, capabilities, and future challenges, emphasizing their transformative impact on AI and NLP fields.

Contribution

It provides a comprehensive overview of LLMs, covering background, key findings, techniques, resources, and future research directions in a unified survey.

Findings

01

Large language models achieve significant performance improvements with increased size.

02

LLMs exhibit emergent abilities not present in smaller models.

03

The survey summarizes resources and discusses future challenges for LLM development.

Abstract

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Apeters247/naxi-qwen3-14b-v5
model· 61 dl· ♡ 1
61 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Adam · Linear Layer · Layer Normalization · Softmax · Residual Connection · Label Smoothing