Language Acquisition Device in Large Language Models
Masato Mita, Taiga Someya, Ryo Yoshida, Yohei Oseki

TL;DR
This paper introduces LAD-inspired pre-pretraining on a hierarchical formal language, MP-STRUCT, to improve data efficiency and structural understanding in large language models, challenging prior assumptions about necessary expressivity.
Contribution
It proposes a novel LAD-inspired pre-pretraining method using MP-STRUCT, demonstrating improved efficiency and resistance to implausible languages, and analyzing key factors like dependency resolution.
Findings
500-step PPT with MP-STRUCT matches formal-language baselines in token efficiency
MP-STRUCT CORE outperforms $k$-Shuffle Dyck despite lower formal expressivity
Functional landmarks reduce dependency ambiguity, aiding effective PPT
Abstract
Large Language Models (LLMs) remain substantially less data-efficient than humans. Pre-pretraining (PPT) on synthetic languages has been proposed to close this gap, with prior work emphasizing highly expressive formal languages such as -Shuffle Dyck. Inspired by the Language Acquisition Device (LAD) hypothesis, which posits that innate constraints preemptively restrict the learner's hypothesis space to natural-language-like structure, we propose LAD-inspired PPT: pre-pretraining on MP-STRUCT, a formal language whose strings encode hierarchical composition, feature-based dependencies, and long-distance displacement via MERGE, AGREE, and MOVE. A brief 500-step PPT with MP-STRUCT matches strong formal-language baselines in token efficiency while additionally imparting a human-like resistance to structurally implausible languages (e.g., REVERSE). Analyzing simplified variants, we find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
