LLMs in the Loop: Leveraging Large Language Model Annotations for Active   Learning in Low-Resource Languages

Nataliia Kholodna; Sahib Julka; Mohammad Khodadadi; Muhammed Nurullah; Gumus; Michael Granitzer

arXiv:2404.02261·cs.CL·June 25, 2024·3 cites

LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages

Nataliia Kholodna, Sahib Julka, Mohammad Khodadadi, Muhammed Nurullah, Gumus, Michael Granitzer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method using large language models within an active learning framework to efficiently annotate data for low-resource languages, significantly reducing costs and improving NLP capabilities in underrepresented languages.

Contribution

It proposes leveraging LLMs like GPT-4-Turbo in active learning for low-resource languages, achieving high performance with minimal data and cost savings.

Findings

01

Near-state-of-the-art performance with less data

02

Estimated cost savings of over 42 times compared to human annotation

03

Effective integration of LLMs in active learning for low-resource NLP

Abstract

Low-resource languages face significant barriers in AI development due to limited linguistic resources and expertise for data labeling, rendering them rare and costly. The scarcity of data and the absence of preexisting tools exacerbate these challenges, especially since these languages may not be adequately represented in various NLP datasets. To address this gap, we propose leveraging the potential of LLMs in the active learning loop for data annotation. Initially, we conduct evaluations to assess inter-annotator agreement and consistency, facilitating the selection of a suitable LLM annotator. The chosen annotator is then integrated into a training loop for a classifier using an active learning paradigm, minimizing the amount of queried data required. Empirical evaluations, notably employing GPT-4-Turbo, demonstrate near-state-of-the-art performance with significantly reduced data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mkandai/llms-in-the-loop
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification