AIDE: Attribute-Guided MultI-Hop Data Expansion for Data Scarcity in Task-Specific Fine-tuning
Jiayu Li, Xuan Zhu, Fang Liu, Yanjun Qi

TL;DR
AIDE is a novel data synthesis framework that uses attribute-guided multi-hop expansion to generate diverse, relevant training data from very few seeds, significantly improving task-specific fine-tuning of large language models.
Contribution
This paper introduces AIDE, a multi-hop data expansion method guided by attributes, which effectively synthesizes relevant data from minimal seeds for fine-tuning large language models.
Findings
AIDE enables effective fine-tuning from just 10 seed data points.
AIDE outperforms existing data synthesis methods like Evol-Instruct by over 30%.
AIDE improves fine-tuning results across multiple large language models.
Abstract
Fine-tuning large language models (LLMs) for specific tasks requires diverse, high-quality training data. However, obtaining sufficient relevant data remains a significant challenge. Existing data synthesis methods either depend on extensive seed datasets or struggle to balance task relevance and data diversity. To address these challenges, we propose Attribute-guided multI-hop Data Expansion (AIDE), a novel data synthesis framework that uses a multi-hop process to expand very few seed data points while ensuring data diversity and task relevance. AIDE extracts the main topic and key knowledge attributes from the seeds to guide the synthesis steps. The process repeats for K hops, using the generated data as seeds. To prevent irrelevant data generation as the hop depth increases, AIDE incorporates a residual connection mechanism. Our empirical results show that AIDE enables fine-tuning of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification
MethodsResidual Connection · Balanced Selection
