Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data

Leonardo Castro-Gonzalez; Yi-Ling Chung; Hannak Rose Kirk; John Francis; Angus R. Williams; Pica Johansson; Jonathan Bright

arXiv:2401.12295·cs.CL·July 29, 2025·1 cites

Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data

Leonardo Castro-Gonzalez, Yi-Ling Chung, Hannak Rose Kirk, John Francis, Angus R. Williams, Pica Johansson, Jonathan Bright

PDF

Open Access 1 Repo

TL;DR

This paper reviews three cost-effective machine learning techniques—weak supervision, transfer learning, and prompt engineering—for social data science, demonstrating their effectiveness across various applications with minimal data and providing practical guidance and code resources.

Contribution

It introduces and compares recent cheap learning techniques, especially zero-shot prompting of large language models, for social science applications, with practical demonstrations and resources.

Findings

01

Prompting large language models achieves high accuracy at low cost.

02

All three techniques perform well across diverse social science tasks.

03

The paper provides practical guides and code for implementation.

Abstract

The field of machine learning has recently made significant progress in reducing the requirements for labelled training data when building new models. These `cheaper' learning techniques hold significant potential for the social sciences, where development of large labelled training datasets is often a significant practical impediment to the use of machine learning for analytical tasks. In this article we review three `cheap' techniques that have developed in recent years: weak supervision, transfer learning and prompt engineering. For the latter, we also review the particular case of zero-shot prompting of large language models. For each technique we provide a guide of how it works and demonstrate its application across six different realistic social science applications (two different tasks paired with three different dataset makeups). We show good performance for all techniques, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

turing-online-safety-codebase/cheap_learning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Topic Modeling · Machine Learning and Algorithms