Theoretical and Methodological Framework for Studying Texts Produced by   Large Language Models

Ji\v{r}\'i Mili\v{c}ka

arXiv:2408.16740·cs.CL·August 30, 2024

Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models

Ji\v{r}\'i Mili\v{c}ka

PDF

Open Access

TL;DR

This paper proposes a theoretical and methodological framework for analyzing texts generated by large language models from a quantitative linguistics perspective, emphasizing non-anthropomorphic approaches and the potential for studying human culture.

Contribution

It introduces a conceptual framework distinguishing LLMs and their simulated entities, advocating for non-anthropomorphic analysis and expanding the study of LLMs' texts within linguistic theory.

Findings

01

Framework differentiates LLMs and simulated entities.

02

Highlights the importance of non-anthropomorphic analysis.

03

Suggests LLMs as tools for studying human culture.

Abstract

This paper addresses the conceptual, methodological and technical challenges in studying large language models (LLMs) and the texts they produce from a quantitative linguistics perspective. It builds on a theoretical framework that distinguishes between the LLM as a substrate and the entities the model simulates. The paper advocates for a strictly non-anthropomorphic approach to models while cautiously applying methodologies used in studying human linguistic behavior to the simulated entities. While natural language processing researchers focus on the models themselves, their architecture, evaluation, and methods for improving performance, we as quantitative linguists should strive to build a robust theory concerning the characteristics of texts produced by LLMs, how they differ from human-produced texts, and the properties of simulated entities. Additionally, we should explore the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsFocus