From Tokens To Agents: A Researcher's Guide To Understanding Large Language Models
Daniele Barolo

TL;DR
This paper provides a comprehensive, accessible framework for understanding large language models, focusing on their components, capabilities, and limitations to guide researchers in their application.
Contribution
It offers a non-technical, structured analysis of LLMs' core components and introduces a reasoning framework for assessing their suitability for research tasks.
Findings
Analyzes six essential LLM components and their implications.
Develops a framework for critical evaluation of LLMs in research.
Includes a case study on simulating social media dynamics with LLM agents.
Abstract
Researchers face a critical choice: how to use -- or not use -- large language models in their work. Using them well requires understanding the mechanisms that shape what LLMs can and cannot do. This chapter makes LLMs comprehensible without requiring technical expertise, breaking down six essential components: pre-training data, tokenization and embeddings, transformer architecture, probabilistic generation, alignment, and agentic capabilities. Each component is analyzed through both technical foundations and research implications, identifying specific affordances and limitations. Rather than prescriptive guidance, the chapter develops a framework for reasoning critically about whether and how LLMs fit specific research needs, finally illustrated through an extended case study on simulating social media dynamics with LLM-based agents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Law · Computational and Text Analysis Methods
