Generating Diverse Training Samples for Relation Extraction with Large Language Models

Zexuan Li; Hongliang Dai; Piji Li

arXiv:2505.23108·cs.CL·May 30, 2025

Generating Diverse Training Samples for Relation Extraction with Large Language Models

Zexuan Li, Hongliang Dai, Piji Li

PDF

Open Access 1 Video

TL;DR

This paper explores methods to enhance the diversity of training samples generated by Large Language Models for Relation Extraction, aiming to improve NLP task performance by balancing diversity and correctness.

Contribution

It introduces techniques to increase sample diversity via ICL prompts and fine-tuning with DPO, demonstrating improved data quality for RE tasks.

Findings

01

Both methods improve the quality of generated training data.

02

Training non-LLM models on generated data can outperform direct LLM RE.

03

Enhanced diversity leads to better relation extraction performance.

Abstract

Using Large Language Models (LLMs) to generate training data can potentially be a preferable way to improve zero or few-shot NLP tasks. However, many problems remain to be investigated for this direction. For the task of Relation Extraction (RE), we find that samples generated by directly prompting LLMs may easily have high structural similarities with each other. They tend to use a limited variety of phrasing while expressing the relation between a pair of entities. Therefore, in this paper, we study how to effectively improve the diversity of the training samples generated with LLMs for RE, while also maintaining their correctness. We first try to make the LLMs produce dissimilar samples by directly giving instructions in In-Context Learning (ICL) prompts. Then, we propose an approach to fine-tune LLMs for diversity training sample generation through Direct Preference Optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Generating Diverse Training Samples for Relation Extraction with Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques