Memory limitations are hidden in grammar

Carlos G\'omez-Rodr\'iguez; Morten H. Christiansen; Ramon; Ferrer-i-Cancho

arXiv:1908.06629·cs.CL·September 22, 2022

Memory limitations are hidden in grammar

Carlos G\'omez-Rodr\'iguez, Morten H. Christiansen, Ramon, Ferrer-i-Cancho

PDF

TL;DR

This paper investigates how memory limitations influence grammatical structures in human language, revealing that syntactic dependencies are optimized to reduce memory load, challenging the assumption of independence between grammar and cognitive constraints.

Contribution

It demonstrates that memory constraints are embedded in grammatical descriptions, highlighting the importance of cognitive factors in linguistic theory.

Findings

01

Average dependency distance is less than chance expectations.

02

Memory limitations influence syntactic dependency structures.

03

Grammatical models should incorporate cognitive constraints.

Abstract

The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be impossible to build a parsimonious theory of human linguistic productivity independent of non-linguistic…

Tables1

Table 1. Table 1: The languages in every collection grouped by family. The counts attached to the collection names indicate the number of different families and the number of different languages. The counts attached to family names indicate the number of different languages.

Collection	Family	Languages
UD (19, 83)	Afro-Asiatic (7)	Akkadian, Amharic, Arabic, Assyrian, Coptic, Hebrew, Maltese
	Turkik (3)	Kazakh, Turkish, Uyghur
	Austro-Asiatic (1)	Vietnamese
	Austronesian (2)	Indonesian, Tagalog
	Basque (1)	Basque
	Dravidian (2)	Tamil, Telugu
	Indo-European (46)	Afrikaans, Ancient Greek, Armenian, Belarusian, Breton, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Faroese, French, Galician, German, Gothic, Greek, Hindi, Hindi-English, Irish, Italian, Kurmanji, Latin, Latvian, Lithuanian, Marathi, Norwegian, Old Church Slavonic, Old French, Old Russian, Persian, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian, Upper Sorbian, Urdu, Welsh
	Japanese (1)	Japanese
	Korean (1)	Korean
	Mande (1)	Bambara
	Mongolic (1)	Buryat
	Niger-Congo (2)	Wolof, Yoruba
	Other (1)	Naija
	Pama-Nyungan (1)	Warlpiri
	Sign Language (1)	Swedish Sign Language
	Sino-Tibetan (3)	Cantonese, Chinese, Classical Chinese
	Tai-Kadai (1)	Thai
	Tupian (1)	Mbya Guarani
	Uralic (7)	Erzya, Estonian, Finnish, Hungarian, Karelian, Komi Zyrian, North Sami
Stanford (7, 30)	Afro-Asiatic (1)	Arabic
	Turkik (1)	Turkish
	Basque (1)	Basque
	Dravidian (2)	Tamil, Telugu
	Indo-European (21)	Ancient Greek, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, German, Greek, Hindi, Italian, Latin, Persian, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish
	Japanese (1)	Japanese
	Uralic (3)	Estonian, Finnish, Hungarian
Prague (7, 30)	Afro-Asiatic (1)	Arabic
	Turkik (1)	Turkish
	Basque (1)	Basque
	Dravidian (2)	Tamil, Telugu
	Indo-European (21)	Ancient Greek, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, German, Greek, Hindi, Italian, Latin, Persian, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish
	Japanese (1)	Japanese
	Uralic (3)	Estonian, Finnish, Hungarian

Equations8

U = (n_{ma x} - n^{*}) S + n = n_{min} \sum n^{*} n T (n) = (n_{ma x} - n^{*}) S n = n_{min} \sum n^{*} n^{n - 1} .

U = (n_{ma x} - n^{*}) S + n = n_{min} \sum n^{*} n T (n) = (n_{ma x} - n^{*}) S n = n_{min} \sum n^{*} n^{n - 1} .

U \approx 1.6 \cdot 1 0^{10}

U \approx 1.6 \cdot 1 0^{10}

⟨ d ⟩_{r l a} = (n + 1) /3

⟨ d ⟩_{r l a} = (n + 1) /3

⟨ d ⟩_{r l a} = (n + 1) /3.

⟨ d ⟩_{r l a} = (n + 1) /3.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Memory limitations are hidden in grammar

Carlos Gómez-Rodríguez (ORCID 0000-0003-0752-8812)

Morten H. Christiansen (ORCID 0000-0002-3850-0655)

Ramon Ferrer-i-Cancho (ORCID 0000-0002-7820-923X)

Universidade da Coruña, CITIC, FASTPARSE Lab, LyS Research Group, Depto. de Ciencias de la Computación y Tecnologías de la Información, A Coruña, Spain.

Department of Psychology, Cornell University, Ithaca, NY, USA.

Interacting Minds Centre and School of Communication and Culture, Nobelparken, Aarhus University, Denmark

Complexity and Quantitative Linguistics Lab, LARCA Research Group, Departament de Ciències de la Computació, Universitat Politècnica de Catalunya (UPC), Barcelona, Catalonia, Spain.

Abstract

The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be impossible to build a parsimonious theory of human linguistic productivity independent of non-linguistic cognitive constraints.

keywords:

dependency syntax , dependency distance minimization , memory , grammar , network science

††journal: Glottometrics

1 Introduction

An often celebrated aspect of human language is its capacity to produce an unbounded number of different sentences [Chomsky, 1965, Miller, 2000]. For many centuries, the goal of linguistics has been to capture this capacity by a formal description—a grammar—consisting of a systematic set of rules and/or principles that determine which sentences are part of a given language and which are not [Bod, 2013]. Over the years, these formal grammars have taken many forms but common to them all is the assumption that they capture the idealized linguistic competence of a native speaker/hearer, independent of any memory limitations or other non-linguistic cognitive constraints [Chomsky, 1965, Miller, 2000]. These abstract formal descriptions have come to play a foundational role in the language sciences, from linguistics, psycholinguistics, and neurolinguistics [Hauser et al., 2002, Pinker, 2003] to computer science, engineering, and machine learning [Klein and Manning, 2003, Dyer et al., 2016, Gómez-Rodríguez et al., 2018]. Despite evidence that processing difficulty underpins the unacceptability of certain sentences [Morrill, 2010, Hawkins, 2004], the cognitive independence assumption that is a defining feature of linguistic competence has not been examined in a systematic way using the tools of formal grammar. It is therefore unclear whether these supposedly idealized descriptions of language are free of non-linguistic cognitive constraints, such as memory limitations.

If the cognitive independence assumption should turn out not to hold, then it would have wide-spread theoretical and practical implications for our understanding of human linguistic productivity. It would require a reappraisal of key parts of linguistic theory that hitherto have posed formidable challenges for explanations of language processing, acquisition and evolution [Gold, 1967, Hauser et al., 2002, Pinker, 2003]—pointing to new ways of thinking about language that may simplify the problem space considerably by making it possible to explain apparently arbitrary aspects of linguistic structure in terms of general learning and processing biases [Christiansen and Chater, 2008, Gómez-Rodríguez and Ferrer-i-Cancho, 2017]. In terms of practical ramifications, engineers may benefit from building human cognitive limitations directly into their natural language processing systems, so as to better mimic human language skills and thereby improve performance. Here, we therefore evaluate the cognitive independence assumption using a state-of-the-art grammatical framework, dependency grammar [Nivre, 2005], to search for possible hidden memory constraints in these formal, idealized descriptions of natural language.

In dependency grammar, the syntactic structure of a sentence is defined by two components. First, a directed graph where vertices are words and arcs indicate syntactic dependencies between a head and its dependent. Such a graph has a root (a vertex that receives no edges) and edges are oriented away from the root (Fig 1). Second, the linear arrangement of the vertices of the graph (defined by the sequential order of the words in a sentence). Thus, syntactic dependency structures constitute a particular kind of spatial network where the graph is embedded in one dimension [Barthélemy, 2018], a correspondence that has led to the development of syntactic theory from a network theory standpoint [Gómez-Rodríguez and Ferrer-i-Cancho, 2017].

Dependency grammar is an important framework for various reasons. First, categorial grammar defines the syntactic structure of a sentence as dependency grammar [Morrill, 2010]. Second, equivalences exist between certain formalisms of dependency grammar and constituency grammar [Gaifman, 1965, Kahane and Mazziotta, 2015]. Third, there has been an evolution of minimalism towards dependency grammar [Osborne et al., 2011]. Finally, dependency grammar has become a de facto standard in computational linguistics [Kübler et al., 2009].

To delimit the set of possible grammatical descriptions, various classes or sets of syntactic dependency structures have been proposed. These classes can be seen as filters on the possible linear arrangements of a given tree. Here, we consider four main classes. First, consider planar structures, where edges do not cross when drawn above the words of the sentence. The structure in Figs 1B-C are planar while that of Fig 1A is not. Second, we have projective structures, the most well-known class. A dependency tree is projective if, and only if, it is planar and its root is not covered by any dependency (Fig 1C). Third, there are mildly non-projective structures, comprising the union of planar structures and additional structures with further (but slight) deviations from projectivity, e.g., by having a low number of edge crossings (Fig 1A). Finally, the class of all structures, that has no constraints on the possible structures.

Fig. 1D shows the inclusion relationships among these classes. However, the whole picture, encompassing state-of-the-art classes is more complex. Mildly non-projective structures are not actually a class but a family of classes. We have selected three representative classes: $MH_{k}$ , $WG_{1}$ and $1EC$ structures, that are supersets of projective structures but whose definition is more complex (see Methods).

Here we validate the assumption of independence between grammatical constraints and cognitive limitations in these classes of grammar using the distance between syntactically related words in a dependency tree as a proxy for memory constraints [Liu et al., 2017, Temperley and Gildea, 2018]. Such a distance is defined as the number of intermediate words plus one. Thus, if the linked words are consecutive they are at distance 1, if they are separated by an intermediate word they are at distance two, and so on, as shown in Fig 1. Dependency distance minimization is a pressure to reduce the distance between syntactically related words that is supported statistically by large-scale analyses of syntactic dependency structures in languages [Liu, 2008, Futrell et al., 2015, 2020, Jing et al., 2021, Ferrer-i-Cancho et al., 2022]. As such, dependency distance minimization is a type of memory constraint, believed to result from pressure against decay of activation or interference during the processing of sentences [Liu et al., 2017, Temperley and Gildea, 2018]. Dependency distances tax memory and cognition in general. Dependency distances reduce in case of cognitive impairment [Roark et al., 2011, Aronsson et al., 2021]. There is an association between the level of cognitive impairment and dependency distance: as the severity of the impairment increases, dependency distances tend to be reduced [Aronsson et al., 2021]. Moreover, an association between the level of competence of L2 learners and dependency distance has also been found: as learners of a second language become more competent in the new language, dependency distances increase [Ouyang and Jiang, 2018, Yuan et al., 2021].

The article is written so that reading the next section, Materials and methods (Section 2) is not essential to understand the Results section (Section 3). Therefore, it is up to reader to decide whether to proceed with Section 2 or to skip to Section 3, reading Section 2 later on.

2 Materials and methods

Control for sentence length

In our study, we do not investigate the average dependency distance over a whole ensemble of dependency structures but instead we condition on sentence length [Ferrer-i-Cancho and Liu, 2014, Futrell et al., 2015]. Then for a given $n$ , we calculate $\left<d\right>_{AS}$ , the average dependency length for an ensemble of artificial syntactic dependency structures (AS), and also $\left<d\right>_{RS}$ , the average dependency length for an ensemble of attested syntactic dependency structures (RS). By doing that, we are controlling for sentence length, getting rid of the possible influence of the distribution of sentence length in the calculation of $\left<d\right>_{RS}$ or $\left<d\right>_{AS}$ [Ferrer-i-Cancho and Liu, 2014].

Attested syntactic dependency structures

We estimated the average dependency distances in attested sentences using collections of syntactic dependency treebanks from different languages. A syntactic dependency treebank is a database of sentences and their syntactic dependency trees.

To provide results on a wide range of languages while controlling for the effects of different syntactic annotation theories, we use two collections of treebanks:

Universal Dependencies (UD), version 2.4 [Nivre et al., 2019]. This is the largest available collection of syntactic dependency treebanks, featuring 146 treebanks from 83 distinct languages. All of these treebanks are annotated following the common Universal Dependencies annotation criteria, which are a variant of the Stanford Dependencies for English [de Marneffe and Manning, 2008], based on lexical-functional grammar [Bresnan, 2000], adapting them to be able to represent syntactic phenomena in diverse languages under a common framework. This collection of treebanks can be freely downloaded111https://universaldependencies.org/. Last accessed 17 February 2022. and is available under free licenses.

2.

HamleDT 2.0 [Rosa et al., 2014]. This collection is smaller than UD, featuring 30 languages, all of which (except for one: Bengali) are also available in UD, often with overlapping source material. Thus, using this collection does not meaningfully extend the diversity of languages covered beyond using only UD. However, the interest of HamleDT 2.0 lies in that each of the 30 treebanks is annotated with not one, but two different sets of annotation criteria: Universal Stanford dependencies [de Marneffe et al., 2014] and Prague Dependencies [Hajič et al., 2006]. We abbreviate these two subsets of the HamleDT 2.0 collection as “Stanford” and “Prague”, respectively. While Universal Stanford dependencies are closely related to UD, Prague dependencies provide a significantly different view of syntax, as they are based on the functional generative description [Sgall, 1969] of the Praguian linguistic tradition [Hajicova, 1995], which differs from Stanford dependencies in substantial ways, like the annotation of conjunctions or adpositions [Passarotti, 2016]. Thus, using this version of HamleDT222While there is a later version (HamleDT 3.0), it abandoned the dual annotation and adopted Universal Dependencies instead, thus making it less useful for our purposes. makes our analysis more robust, as we can draw conclusions without being tied to a single linguistic tradition. The HamleDT 2.0 treebanks are available online.333https://ufal.mff.cuni.cz/hamledt/hamledt-treebanks-20. Last accessed 17 February 2022. While not all of the treebanks are made fully available to the public under free licenses, to reproduce our analysis it is sufficient to use a stripped version where the words have been removed from the sentences for licensing reasons, but the bare trees are available. This version is distributed freely.444https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0023-9551-4?show=full. Last accessed 17 February 2022.

A preprocessed file with the minimal information needed to reproduce our measurements on attested syntactic structures (Fig 6A) is available. 555 https://doi.org/10.7910/DVN/XHRIYX

To preprocess the treebanks for our analysis, we removed punctuation, following common practice in statistical research of dependency structures [Gómez-Rodríguez and Ferrer-i-Cancho, 2017]. We also removed tree nodes that do not correspond to actual words, such as the null elements in the Bengali, Hindi and Telugu HamleDT corpora and the empty nodes in several UD treebanks. To ensure that the dependency structures are still valid trees after these removals, we reattached nodes whose head has been deleted as dependents of their nearest non-deleted ancestor. Finally, in our analysis we disregarded syntactic trees with less than three nodes, as their statistical properties are trivial and provide no useful information (a single-node dependency tree has no dependencies at all, and a 2-node tree always has a single dependency of distance 1). Table 2 summarizes the languages in each collection of treebanks.

Bibliography84

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Albert Park and Levy [2009] Albert Park, Y., Levy, R., 2009. Minimal-length linearizations for mildly context-sensitive dependency trees, in: Proceedings of the 10th Annual Meeting of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) conference, Association for Computational Linguistics, Stroudsburg, PA, USA. pp. 335–343.
2Alemany-Puig [2019] Alemany-Puig, L., 2019. Edge crossings in linear arrangements: From theory to algorithms and applications. Master thesis. Barcelona School of Informatics.
3Aronsson et al. [2021] Aronsson, F.S., Kuhlmann, M., Jelic, V., Östberg, P., 2021. Is cognitive impairment associated with reduced syntactic complexity in writing? Evidence from automated text analysis. Aphasiology 35, 900–913. doi: 10.1080/02687038.2020.1742282 . · doi ↗
4Barthélemy [2018] Barthélemy, M., 2018. Morphogenesis of Spatial Networks. Springer, Cham, Switzerland.
5Bod [2013] Bod, R., 2013. A New History of the Humanities: The Search for Principles and Patterns from Antiquity to the Present. Oxford University Press, Oxford, UK.
6Bodirsky et al. [2005] Bodirsky, M., Kuhlmann, M., Möhl, M., 2005. Well-nested drawings as models of syntactic structure, in: 10th Conference on Formal Grammar and 9th Meeting on Mathematics of Language, Edinburgh, Scotland, UK. pp. 195–203.
7Bresnan [2000] Bresnan, J., 2000. Lexical-Functional Syntax. Blackwell, Chichester, United Kingdom.
8Cayley [1889] Cayley, A., 1889. A theorem on trees. Quart. J. Math 23, 376–378.