Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks

Atsuki Yamaguchi; Maggie Mi; Nikolaos Aletras

arXiv:2601.03448·cs.CL·April 17, 2026

Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks

Atsuki Yamaguchi, Maggie Mi, Nikolaos Aletras

PDF

1 Repo 25 Models

TL;DR

This paper introduces L2T, a pre-training framework that incorporates language learning tasks to explicitly enhance linguistic competence in language models, inspired by human language acquisition.

Contribution

It presents a novel pre-training method combining raw text with structured language learning tasks, improving linguistic skills and accelerating learning in language models.

Findings

01

L2T improves performance on linguistic competence benchmarks.

02

Pre-training with L2T accelerates linguistic skill acquisition.

03

Maintains competitive reasoning performance.

Abstract

Language models (LMs) are pre-trained on raw text datasets to generate text sequences token-by-token. While this approach facilitates the learning of world knowledge and reasoning, it does not explicitly optimize for linguistic competence. To bridge this gap, we propose L2T, a pre-training framework integrating Language Learning Tasks alongside standard next-token prediction. Inspired by human language acquisition, L2T transforms raw text into structured input-output pairs to provide explicit linguistic stimulation. Pre-training LMs on a mixture of raw text and L2T data not only improves overall performance on linguistic competence benchmarks but accelerates its acquisition, while maintaining competitive performance on general reasoning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gucci-j/l2t
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.