A Guide To Effectively Leveraging LLMs for Low-Resource Text Summarization: Data Augmentation and Semi-supervised Approaches
Gaurav Sahu, Olga Vechtomova, Issam H. Laradji

TL;DR
This paper introduces MixSumm and PPSL, two novel LLM-based methods for low-resource text summarization that improve performance using data augmentation and semi-supervised learning, achieving results comparable to fully supervised models with minimal labeled data.
Contribution
The paper presents two innovative approaches, MixSumm and PPSL, that leverage LLMs for data augmentation and semi-supervised learning in low-resource summarization tasks.
Findings
MixSumm synthesizes high-quality documents for few-shot learning.
PPSL generates pseudo-labels for semi-supervised training.
Achieves competitive ROUGE scores with only 5% labeled data.
Abstract
Existing approaches for low-resource text summarization primarily employ large language models (LLMs) like GPT-3 or GPT-4 at inference time to generate summaries directly; however, such approaches often suffer from inconsistent LLM outputs and are difficult to adapt to domain-specific data in low-resource scenarios. In this work, we propose two novel methods to effectively utilize LLMs for low-resource text summarization: 1) MixSumm, an LLM-based data augmentation regime that synthesizes high-quality documents (short and long) for few-shot text summarization, and 2) PPSL, a prompt-based pseudolabeling strategy for sample-efficient semi-supervised text summarization. Specifically, MixSumm leverages the open-source LLaMA-3-70b-Instruct model to generate new documents by mixing topical information derived from a small seed set, and PPSL leverages the LLaMA-3-70b-Instruct model to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Data Quality and Management
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · GPT-4 · Cosine Annealing · Linear Layer
