Language Diversity: Visible to Humans, Exploitable by Machines
G\'abor Bella, Erdenebileg Byambadorj, Yamini Chandrashekar,, Khuyagbaatar Batsuren, Danish Ashgar Cheema, Fausto Giunchiglia

TL;DR
The paper introduces the Universal Knowledge Core, a multilingual lexical database that visualizes language diversity for humans and enables cross-lingual applications for machines.
Contribution
It presents a large-scale multilingual lexical database with tools and data for visualizing and exploiting language diversity across over a thousand languages.
Findings
Provides access to millions of words and meanings
Shows phenomena like shared meanings and cognate clusters
Enables cross-lingual applications
Abstract
The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the somewhat abstract notion of diversity visually understandable for humans and formally exploitable by machines. The UKC website lets users explore millions of individual words and their meanings, but also phenomena of cross-lingual convergence and divergence, such as shared interlingual meanings, lexicon similarities, cognate clusters, or lexical gaps. The UKC LiveLanguage Catalogue, in turn, provides access to the underlying lexical data in a computer-processable form, ready to be reused in cross-lingual applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
