Language Diversity: Visible to Humans, Exploitable by Machines

G\'abor Bella; Erdenebileg Byambadorj; Yamini Chandrashekar,; Khuyagbaatar Batsuren; Danish Ashgar Cheema; Fausto Giunchiglia

arXiv:2203.04723·cs.CL·March 10, 2022

Language Diversity: Visible to Humans, Exploitable by Machines

G\'abor Bella, Erdenebileg Byambadorj, Yamini Chandrashekar,, Khuyagbaatar Batsuren, Danish Ashgar Cheema, Fausto Giunchiglia

PDF

Open Access

TL;DR

The paper introduces the Universal Knowledge Core, a multilingual lexical database that visualizes language diversity for humans and enables cross-lingual applications for machines.

Contribution

It presents a large-scale multilingual lexical database with tools and data for visualizing and exploiting language diversity across over a thousand languages.

Findings

01

Provides access to millions of words and meanings

02

Shows phenomena like shared meanings and cognate clusters

03

Enables cross-lingual applications

Abstract

The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the somewhat abstract notion of diversity visually understandable for humans and formally exploitable by machines. The UKC website lets users explore millions of individual words and their meanings, but also phenomena of cross-lingual convergence and divergence, such as shared interlingual meanings, lexicon similarities, cognate clusters, or lexical gaps. The UKC LiveLanguage Catalogue, in turn, provides access to the underlying lexical data in a computer-processable form, ready to be reused in cross-lingual applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques