MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot   Question Answering

Xiusi Chen; Jyun-Yu Jiang; Wei-Cheng Chang; Cho-Jui Hsieh; Hsiang-Fu; Yu; Wei Wang

arXiv:2310.05007·cs.CL·May 29, 2024·1 cites

MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering

Xiusi Chen, Jyun-Yu Jiang, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu, Yu, Wei Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

MinPrompt is a graph-based data augmentation method that selects minimal yet informative sentence subsets for fine-tuning large language models in open-domain question answering, improving efficiency and accuracy.

Contribution

It introduces a novel minimal data augmentation framework using graph algorithms and unsupervised question generation for few-shot QA.

Findings

01

Achieves comparable or better F-1 scores than baselines.

02

Reduces the amount of data needed for effective fine-tuning.

03

Demonstrates efficiency and effectiveness across multiple benchmark datasets.

Abstract

Recent advances in few-shot question answering (QA) mostly rely on the power of pre-trained large language models (LLMs) and fine-tuning in specific settings. Although the pre-training stage has already equipped LLMs with powerful reasoning capabilities, LLMs still need to be fine-tuned to adapt to specific domains to achieve the best results. In this paper, we propose to select the most informative data for fine-tuning, thereby improving the efficiency of the fine-tuning process with comparative or even better accuracy on the open-domain QA task. We present MinPrompt, a minimal data augmentation framework for open-domain QA based on an approximate graph algorithm and unsupervised question generation. We transform the raw text into a graph structure to build connections between different factual sentences, then apply graph algorithms to identify the minimal set of sentences needed to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiusic/MinPrompt
none

Videos

MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications