Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Michael Y. Hu; Jackson Petty; Chuan Shi; William Merrill; Tal Linzen

arXiv:2502.19249·cs.CL·May 28, 2025

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Michael Y. Hu, Jackson Petty, Chuan Shi, William Merrill, Tal Linzen

PDF

Open Access 1 Video

TL;DR

Pretraining language models on formal languages that mimic natural language dependencies enhances their linguistic capabilities and efficiency, with hierarchical formal languages providing significant transfer benefits and mechanistic evidence supporting this transfer.

Contribution

This study demonstrates that pre-pretraining on hierarchical formal languages improves natural language understanding and transfer efficiency in transformer models, highlighting the importance of dependency structures.

Findings

01

Formal languages capturing hierarchical dependencies improve natural language loss.

02

Pre-pretraining reduces training token requirements by 33%.

03

Attention heads trained on formal languages remain crucial for syntactic tasks.

Abstract

Pretraining language models on formal language can improve their acquisition of natural language. Which features of the formal language impart an inductive bias that leads to effective transfer? Drawing on insights from linguistics and complexity theory, we hypothesize that effective transfer occurs when two conditions are met: the formal language should capture the dependency structures present in natural language, and it should remain within the computational limitations of the model architecture. We experiment with pre-pretraining (training on formal language before natural languages) on transformers and find that formal languages capturing hierarchical dependencies indeed enable language models to achieve lower loss on natural language and better linguistic generalization compared to other formal languages. We also find modest support for the hypothesis that the formal language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases· underline

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling