Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Edoardo Maria Ponti, Helen O'Horan, Yevgeni Berzak, Ivan Vuli\'c, Roi, Reichart, Thierry Poibeau, Ekaterina Shutova, Anna Korhonen

TL;DR
This survey reviews how linguistic typology informs multilingual NLP, highlighting current limitations and proposing data-driven methods to better integrate typological features for improved language modeling.
Contribution
It provides a comprehensive overview of typological data use in NLP and advocates for data-driven approaches to enhance integration of typological knowledge.
Findings
Typological databases offer modest performance gains in NLP.
Limitations include coverage gaps and feature granularity issues.
Data-driven induction can better adapt typological features to NLP.
Abstract
Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
