Generating Diverse Training Samples for Relation Extraction with Large Language Models
Zexuan Li, Hongliang Dai, Piji Li

TL;DR
This paper explores methods to enhance the diversity of training samples generated by Large Language Models for Relation Extraction, aiming to improve NLP task performance by balancing diversity and correctness.
Contribution
It introduces techniques to increase sample diversity via ICL prompts and fine-tuning with DPO, demonstrating improved data quality for RE tasks.
Findings
Both methods improve the quality of generated training data.
Training non-LLM models on generated data can outperform direct LLM RE.
Enhanced diversity leads to better relation extraction performance.
Abstract
Using Large Language Models (LLMs) to generate training data can potentially be a preferable way to improve zero or few-shot NLP tasks. However, many problems remain to be investigated for this direction. For the task of Relation Extraction (RE), we find that samples generated by directly prompting LLMs may easily have high structural similarities with each other. They tend to use a limited variety of phrasing while expressing the relation between a pair of entities. Therefore, in this paper, we study how to effectively improve the diversity of the training samples generated with LLMs for RE, while also maintaining their correctness. We first try to make the LLMs produce dissimilar samples by directly giving instructions in In-Context Learning (ICL) prompts. Then, we propose an approach to fine-tune LLMs for diversity training sample generation through Direct Preference Optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
