LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based   Measures for Social Science Research

Yi Yang; Hanyu Duan; Jiaxin Liu; Kar Yan Tam

arXiv:2409.12722·cs.CL·September 20, 2024

LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research

Yi Yang, Hanyu Duan, Jiaxin Liu, Kar Yan Tam

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach using large language models' internal states to generate valid, consistent, and reproducible text-based measures for social science research, addressing the need for reliable concept quantification from text data.

Contribution

It proposes a new method that learns concept vectors from LLMs' hidden states to produce standardized measures, improving validity and reproducibility in social science text analysis.

Findings

01

Method produces highly valid measures across studies

02

Ensures consistency and reproducibility in concept estimation

03

Applicable across diverse social science contexts

Abstract

The increasing use of text as data in social science research necessitates the development of valid, consistent, reproducible, and efficient methods for generating text-based concept measures. This paper presents a novel method that leverages the internal hidden states of large language models (LLMs) to generate these concept measures. Specifically, the proposed method learns a concept vector that captures how the LLM internally represents the target concept, then estimates the concept value for text data by projecting the text's LLM hidden states onto the concept vector. Three replication studies demonstrate the method's effectiveness in producing highly valid, consistent, and reproducible text-based measures across various social science research contexts, highlighting its potential as a valuable tool for the research community.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hduanac/llm-measure
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods