Towards a theory of how the structure of language is acquired by deep   neural networks

Francesco Cagnetta; Matthieu Wyart

arXiv:2406.00048·cs.CL·October 30, 2024

Towards a theory of how the structure of language is acquired by deep neural networks

Francesco Cagnetta, Matthieu Wyart

PDF

Open Access 1 Video

TL;DR

This paper investigates how deep neural networks learn language structure by analyzing synthetic datasets generated from a probabilistic context-free grammar, revealing how training data size influences the depth of learned hierarchical representations.

Contribution

It provides an analytical framework linking training set size to the depth of hierarchical structure learned by language models, supported by empirical validation on real texts.

Findings

01

Correlation range increases with training data size

02

Deeper grammar representations improve model performance

03

Scaling laws depend on context window length

Abstract

How much data is required to learn the structure of a language via next-token prediction? We study this question for synthetic datasets generated via a Probabilistic Context-Free Grammar (PCFG) -- a tree-like generative model that captures many of the hierarchical structures found in natural languages. We determine token-token correlations analytically in our model and show that they can be used to build a representation of the grammar's hidden variables, the longer the range the deeper the variable. In addition, a finite training set limits the resolution of correlations to an effective range, whose size grows with that of the training set. As a result, a Language Model trained with increasingly many examples can build a deeper representation of the grammar's structure, thus reaching good performance despite the high dimensionality of the problem. We conjecture that the relationship…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards a theory of how the structure of language is acquired by deep neural networks· slideslive

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training