Disambiguatory Signals are Stronger in Word-initial Positions

Tiago Pimentel; Ryan Cotterell; Brian Roark

arXiv:2102.02183·cs.CL·February 4, 2021

Disambiguatory Signals are Stronger in Word-initial Positions

Tiago Pimentel, Ryan Cotterell, Brian Roark

PDF

1 Repo

TL;DR

This study investigates whether words across languages tend to encode more information at the beginning than at the end, addressing previous methodological issues and providing evidence for a universal pattern of front-loaded information in words.

Contribution

The paper introduces new measures to accurately compare segment informativeness in words, confirming a cross-linguistic tendency for front-loaded information content.

Findings

01

Evidence of stronger disambiguatory signals in word-initial positions across hundreds of languages.

02

New measures successfully control for methodological confounds in informativeness analysis.

03

Support for the hypothesis that languages evolve to front-load information in words.

Abstract

Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e.g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower). This has led to the conjecture -- as in Wedel et al. (2019b), but common elsewhere -- that languages have evolved to provide more information earlier in words than later. Information-theoretic methods to establish such tendencies in lexicons have suffered from several methodological shortcomings that leave open the question of whether this high word-initial informativeness is actually a property of the lexicon or simply an artefact of the incremental nature of recognition. In this paper, we point out the confounds in existing methods for comparing the informativeness of segments early in the word versus later in the word,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tpimentelms/frontload-disambiguation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.