# The Insight-Inference Loop: Efficient Text Classification via Natural Language Inference and Threshold-Tuning

**Authors:** Sandrine Chausson, Marion Fourcade, David J. Harding, Björn Ross, Grégory Renard

PMC · DOI: 10.1177/00491241251326819 · Sociological Methods & Research · 2025-04-18

## TL;DR

This paper introduces a new method for text classification that reduces the need for labeled data and machine learning expertise by integrating social scientists into the workflow.

## Contribution

The novel approach uses natural language inference and threshold-tuning to enable efficient, expert-informed text classification.

## Key findings

- The method requires less human-labeled data and no machine learning expertise.
- It was successfully applied to analyze tweets from the 2020 U.S. presidential election campaign.
- The approach outperformed various computational methods across three datasets.

## Abstract

Modern computational text classification methods have brought social scientists tantalizingly close to the goal of unlocking vast insights buried in text data—from centuries of historical documents to streams of social media posts. Yet three barriers still stand in the way: the tedious labor of manual text annotation, the technical complexity that keeps these tools out of reach for many researchers, and, perhaps most critically, the challenge of bridging the gap between sophisticated algorithms and the deep theoretical understanding social scientists have already developed about human interactions, social structures, and institutions. To counter these limitations, we propose an approach to large-scale text analysis that requires substantially less human-labeled data, and no machine learning expertise, and efficiently integrates the social scientist into critical steps in the workflow. This approach, which allows the detection of statements in text, relies on large language models pre-trained for natural language inference, and a “few-shot” threshold-tuning algorithm rooted in active learning principles. We describe and showcase our approach by analyzing tweets collected during the 2020 U.S. presidential election campaign, and benchmark it against various computational approaches across three datasets.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13038153/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13038153/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/PMC13038153/full.md

---
Source: https://tomesphere.com/paper/PMC13038153