Language Acquisition Device in Large Language Models

Masato Mita; Taiga Someya; Ryo Yoshida; Yohei Oseki

arXiv:2605.16758·cs.CL·May 19, 2026

Language Acquisition Device in Large Language Models

Masato Mita, Taiga Someya, Ryo Yoshida, Yohei Oseki

PDF

TL;DR

This paper introduces LAD-inspired pre-pretraining on a hierarchical formal language, MP-STRUCT, to improve data efficiency and structural understanding in large language models, challenging prior assumptions about necessary expressivity.

Contribution

It proposes a novel LAD-inspired pre-pretraining method using MP-STRUCT, demonstrating improved efficiency and resistance to implausible languages, and analyzing key factors like dependency resolution.

Findings

01

500-step PPT with MP-STRUCT matches formal-language baselines in token efficiency

02

MP-STRUCT CORE outperforms $k$-Shuffle Dyck despite lower formal expressivity

03

Functional landmarks reduce dependency ambiguity, aiding effective PPT

Abstract

Large Language Models (LLMs) remain substantially less data-efficient than humans. Pre-pretraining (PPT) on synthetic languages has been proposed to close this gap, with prior work emphasizing highly expressive formal languages such as $k$ -Shuffle Dyck. Inspired by the Language Acquisition Device (LAD) hypothesis, which posits that innate constraints preemptively restrict the learner's hypothesis space to natural-language-like structure, we propose LAD-inspired PPT: pre-pretraining on MP-STRUCT, a formal language whose strings encode hierarchical composition, feature-based dependencies, and long-distance displacement via MERGE, AGREE, and MOVE. A brief 500-step PPT with MP-STRUCT matches strong formal-language baselines in token efficiency while additionally imparting a human-like resistance to structurally implausible languages (e.g., REVERSE). Analyzing simplified variants, we find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.