Are Large Language Models the future crowd workers of Linguistics?

Iris Ferrazzo

arXiv:2502.10266·cs.CL·February 17, 2025

Are Large Language Models the future crowd workers of Linguistics?

Iris Ferrazzo

PDF

Open Access

TL;DR

This study investigates whether Large Language Models like GPT-4 can replace human participants in empirical linguistic research, showing they can outperform humans in certain tasks and highlighting the potential for broader application in humanities research.

Contribution

It demonstrates the effectiveness of LLMs in linguistic data elicitation tasks and explores advanced prompting techniques to improve alignment with human performance.

Findings

01

LLMs outperform humans in linguistic tasks

02

Chain-of-Thought prompting improves LLM performance

03

LLMs show high versatility in linguistic data collection

Abstract

Data elicitation from human participants is one of the core data collection strategies used in empirical linguistic research. The amount of participants in such studies may vary considerably, ranging from a handful to crowdsourcing dimensions. Even if they provide resourceful extensive data, both of these settings come alongside many disadvantages, such as low control of participants' attention during task completion, precarious working conditions in crowdsourcing environments, and time-consuming experimental designs. For these reasons, this research aims to answer the question of whether Large Language Models (LLMs) may overcome those obstacles if included in empirical linguistic pipelines. Two reproduction case studies are conducted to gain clarity into this matter: Cruz (2023) and Lombard et al. (2021). The two forced elicitation tasks, originally designed for human participants, are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification