Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human   Annotation: A Case Study Using Schedule-of-Event Table Detection

Bhawesh Kumar; Jonathan Amar; Eric Yang; Nan Li; Yugang Jia

arXiv:2405.06093·cs.LG·August 6, 2024·1 cites

Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection

Bhawesh Kumar, Jonathan Amar, Eric Yang, Nan Li, Yugang Jia

PDF

Open Access

TL;DR

This paper demonstrates that selective fine-tuning of large language models using high-confidence, auto-generated labels can reduce reliance on costly human annotations, especially in specialized healthcare tasks like Schedule-of-Event table detection.

Contribution

It introduces a filtering mechanism for auto-generated labels and shows that fine-tuning LLMs with these labels can outperform existing models and approach the performance of models trained on non-expert annotations.

Findings

01

Fine-tuned PaLM-2 exceeds Gemini-Pro 1.0 and other LLMs in performance.

02

Filtering high-confidence labels reduces noise and improves fine-tuning effectiveness.

03

Auto-generated labels can substitute for costly expert annotations in specialized tasks.

Abstract

Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We show that fine-tuned PaLM-2 with those labels achieves performance that exceeds the gemini-pro 1.0 and other LLMs. Furthermore, its performance is close to a PaLM-2 fine-tuned on labels obtained from non-expert annotators.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Time Series Analysis and Forecasting · Context-Aware Activity Recognition Systems