False perspectives on human language: why statistics needs linguistics

Matteo Greco; Andrea Cometa; Fiorenzo Artoni; Robert Frank; Andrea; Moro

arXiv:2302.08822·cs.CL·February 20, 2023

False perspectives on human language: why statistics needs linguistics

Matteo Greco, Andrea Cometa, Fiorenzo Artoni, Robert Frank, Andrea, Moro

PDF

Open Access

TL;DR

This paper argues that statistical and structural linguistic models are not mutually exclusive, demonstrating that surprisal measures reflecting syntactic structure better explain language regularities.

Contribution

It shows empirically that surprisal models incorporating syntactic structure outperform purely surface-based models in capturing language regularities.

Findings

01

Syntactic surprisal models better predict language patterns

02

Statistical measures can be grounded in structural linguistic models

03

The dichotomy between statistics and linguistics is false

Abstract

A sharp tension exists about the nature of human language between two opposite parties: those who believe that statistical surface distributions, in particular using measures like surprisal, provide a better understanding of language processing, vs. those who believe that discrete hierarchical structures implementing linguistic information such as syntactic ones are a better tool. In this paper, we show that this dichotomy is a false one. Relying on the fact that statistical measures can be defined on the basis of either structural or non-structural models, we provide empirical evidence that only models of surprisal that reflect syntactic structure are able to account for language regularities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Language and cultural evolution