Crowdsourcing Lexical Diversity
Hadi Khalilia, Jahna Otterbacher, Gabor Bella, Shandy Darma, Fausto Giunchiglia

TL;DR
This paper introduces a crowdsourcing approach and platform to identify and reduce bias in lexical-semantic resources across languages, focusing on lexical gaps and language-specific concepts, validated through food-related case studies.
Contribution
It presents a novel crowdsourcing methodology and platform for detecting lexical gaps and bias in multilingual lexical resources, demonstrated through two case studies.
Findings
Identified 2,140 lexical gaps in English-Arabic comparison.
Found 951 lexical gaps in Indonesian-Banjarese comparison.
Validated the effectiveness of the crowdsourcing method and platform.
Abstract
Lexical-semantic resources (LSRs), such as online lexicons and wordnets, are fundamental to natural language processing applications as well as to fields such as linguistic anthropology and language preservation. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also the rarely addressed issue of bias towards the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual lexical gaps, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics, Language Diversity, and Identity
