Enhancing SLM via ChatGPT and Dataset Augmentation

Tom Pieper; Mohamad Ballout; Ulf Krumnack; Gunther Heidemann; and; Kai-Uwe K\"uhnberger

arXiv:2409.12599·cs.CL·September 20, 2024

Enhancing SLM via ChatGPT and Dataset Augmentation

Tom Pieper, Mohamad Ballout, Ulf Krumnack, Gunther Heidemann, and, Kai-Uwe K\"uhnberger

PDF

Open Access

TL;DR

This paper demonstrates that augmenting small language models with synthetic data generated by ChatGPT-3.5-Turbo and knowledge distillation techniques improves their performance on natural language inference tasks, offering a cost-effective alternative to large models.

Contribution

The paper introduces a novel dataset augmentation method using ChatGPT-3.5-Turbo to enhance small language models' performance on NLI tasks, reducing reliance on human annotation.

Findings

01

Synthetic rationales improve accuracy by 1.3% and 2.3% on ANLI.

02

Knowledge distillation with augmented data enhances small model capabilities.

03

Cost-effective approach for improving NLP model performance.

Abstract

This paper explores the enhancement of small language models through strategic dataset augmentation via ChatGPT-3.5-Turbo, in the domain of Natural Language Inference (NLI). By employing knowledge distillation-based techniques and synthetic dataset augmentation, we aim to bridge the performance gap between large language models (LLMs) and small language models (SLMs) without the immense cost of human annotation. Our methods involve two forms of rationale generation--information extraction and informed reasoning--to enrich the ANLI dataset. We then fine-tune T5-Small on these augmented datasets, evaluating its performance against an established benchmark. Our findings reveal that the incorporation of synthetic rationales significantly improves the model's ability to comprehend natural language, leading to 1.3\% and 2.3\% higher classification accuracy, respectively, on the ANLI dataset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic Prediction and Management Techniques

MethodsKnowledge Distillation