TL;DR
This paper introduces L2T, a pre-training framework that incorporates language learning tasks to explicitly enhance linguistic competence in language models, inspired by human language acquisition.
Contribution
It presents a novel pre-training method combining raw text with structured language learning tasks, improving linguistic skills and accelerating learning in language models.
Findings
L2T improves performance on linguistic competence benchmarks.
Pre-training with L2T accelerates linguistic skill acquisition.
Maintains competitive reasoning performance.
Abstract
Language models (LMs) are pre-trained on raw text datasets to generate text sequences token-by-token. While this approach facilitates the learning of world knowledge and reasoning, it does not explicitly optimize for linguistic competence. To bridge this gap, we propose L2T, a pre-training framework integrating Language Learning Tasks alongside standard next-token prediction. Inspired by human language acquisition, L2T transforms raw text into structured input-output pairs to provide explicit linguistic stimulation. Pre-training LMs on a mixture of raw text and L2T data not only improves overall performance on linguistic competence benchmarks but accelerates its acquisition, while maintaining competitive performance on general reasoning tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗l2t-project/l2t-500m-disjointmodel· 2 dl2 dl
- 🤗l2t-project/l2t-500m-disjoint-mix_0model· 2 dl2 dl
- 🤗l2t-project/l2t-500m-disjoint-mix_25model· 2 dl2 dl
- 🤗l2t-project/l2t-500m-disjoint-mix_75model· 1 dl1 dl
- 🤗l2t-project/raw-1b-disjointmodel
- 🤗l2t-project/raw-500m-disjointmodel· 4 dl4 dl
- 🤗l2t-project/l2t-1b-disjointmodel· 1 dl1 dl
- 🤗l2t-project/l2t-500m-char_countmodel· 1 dl1 dl
- 🤗l2t-project/l2t-500m-deletionmodel· 3 dl3 dl
- 🤗l2t-project/l2t-500m-halfmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
