Basic Reading Distillation
Zhi Zhou, Sirui Miao, Xiangyu Duan, Hao Yang, Min Zhang

TL;DR
This paper introduces basic reading distillation (BRD), a method to train small language models on fundamental reading skills, enabling them to perform competitively with much larger models across various NLP tasks.
Contribution
BRD is a novel approach that focuses on teaching small models basic reading skills, which enhances their performance on multiple NLP benchmarks independently of traditional distillation methods.
Findings
Small models trained with BRD outperform or match much larger models.
BRD influences the model's probability distribution effectively.
BRD is orthogonal to existing knowledge and task distillation methods.
Abstract
Large language models (LLMs) have demonstrated remarkable abilities in various natural language processing areas, but they demand high computation resources which limits their deployment in real-world. Distillation is one technique to solve this problem through either knowledge distillation or task distillation. Both distillation approaches train small models to imitate specific features of LLMs, but they all neglect basic reading education for small models on generic texts that are \emph{unrelated} to downstream tasks. In this paper, we propose basic reading distillation (BRD) which educates a small model to imitate LLMs basic reading behaviors, such as named entity recognition, question raising and answering, on each sentence. After such basic education, we apply the small model on various tasks including language inference benchmarks and BIG-bench tasks. It shows that the small model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification
