From Tokens To Agents: A Researcher's Guide To Understanding Large Language Models

Daniele Barolo

arXiv:2603.19269·cs.CL·March 23, 2026

From Tokens To Agents: A Researcher's Guide To Understanding Large Language Models

Daniele Barolo

PDF

Open Access

TL;DR

This paper provides a comprehensive, accessible framework for understanding large language models, focusing on their components, capabilities, and limitations to guide researchers in their application.

Contribution

It offers a non-technical, structured analysis of LLMs' core components and introduces a reasoning framework for assessing their suitability for research tasks.

Findings

01

Analyzes six essential LLM components and their implications.

02

Develops a framework for critical evaluation of LLMs in research.

03

Includes a case study on simulating social media dynamics with LLM agents.

Abstract

Researchers face a critical choice: how to use -- or not use -- large language models in their work. Using them well requires understanding the mechanisms that shape what LLMs can and cannot do. This chapter makes LLMs comprehensible without requiring technical expertise, breaking down six essential components: pre-training data, tokenization and embeddings, transformer architecture, probabilistic generation, alignment, and agentic capabilities. Each component is analyzed through both technical foundations and research implications, identifying specific affordances and limitations. Rather than prescriptive guidance, the chapter develops a framework for reasoning critically about whether and how LLMs fit specific research needs, finally illustrated through an extended case study on simulating social media dynamics with LLM-based agents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Law · Computational and Text Analysis Methods