Rethinking Scale: The Efficacy of Fine-Tuned Open-Source LLMs in Large-Scale Reproducible Social Science Research
Marcello Carammia, Stefano Maria Iacus, Giuseppe Porro

TL;DR
This paper shows that small, fine-tuned open-source LLMs can match or outperform proprietary models like ChatGPT-4 in social science tasks, promoting transparency, reproducibility, and cost-effectiveness.
Contribution
It demonstrates the effectiveness of fine-tuning open-source LLMs for social science research and proposes a hybrid workflow combining open and closed models.
Findings
Fine-tuned open-source LLMs can outperform large proprietary models.
Training set size impacts fine-tuning success.
A hybrid workflow enhances performance and reproducibility.
Abstract
Large Language Models (LLMs) are distinguished by their architecture, which dictates their parameter size and performance capabilities. Social scientists have increasingly adopted LLMs for text classification tasks, which are difficult to scale with human coders. While very large, closed-source models often deliver superior performance, their use presents significant risks. These include lack of transparency, potential exposure of sensitive data, challenges to replicability, and dependence on proprietary systems. Additionally, their high costs make them impractical for large-scale research projects. In contrast, open-source models, although available in various sizes, may underperform compared to commercial alternatives if used without further fine-tuning. However, open-source models offer distinct advantages: they can be run locally (ensuring data privacy), fine-tuned for specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices
MethodsSparse Evolutionary Training
