Baby's CoThought: Leveraging Large Language Models for Enhanced   Reasoning in Compact Models

Zheyu Zhang; Han Yang; Bolei Ma; David R\"ugamer; Ercong Nie

arXiv:2308.01684·cs.CL·October 24, 2023

Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

Zheyu Zhang, Han Yang, Bolei Ma, David R\"ugamer, Ercong Nie

PDF

Open Access 1 Repo

TL;DR

This paper introduces CoThought, a pipeline that leverages large language models to restructure small datasets, enabling the training of compact models that outperform standard baselines on various language understanding tasks.

Contribution

The paper presents a novel method for training small language models by using LLMs to generate task-oriented data, improving their performance on multiple benchmarks.

Findings

01

BabyLM outperforms vanilla RoBERTa by over 3 points on several tasks.

02

Reconstructed datasets enable small models to better understand contextual information.

03

The approach improves training efficiency for compact language models.

Abstract

Large Language Models (LLMs) demonstrate remarkable performance on a variety of natural language understanding (NLU) tasks, primarily due to their in-context learning ability. This ability could be applied to building babylike models, i.e. models at small scales, improving training efficiency. In this paper, we propose a "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs) by leveraging the Chain of Thought prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oooranz/baby-cothought
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Byte Pair Encoding · Weight Decay · Linear Warmup With Linear Decay · 15 Ways to Contact How can i speak to someone at Delta Airlines · Softmax