Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA
Renhao Pei, Siyao Peng, Verena Blaschke, Robert Litschko, Barbara Plank

TL;DR
This paper investigates how large language models handle information asymmetry between local and standard language varieties, revealing significant performance gaps and the potential of local Wikipedia editions to improve knowledge coverage.
Contribution
It introduces a novel QA dataset focusing on local versus standard language variants and evaluates LLMs' performance, highlighting challenges and opportunities for inclusivity.
Findings
LLMs struggle with questions about local language-specific information.
Providing context from Wikipedia lead sections improves LLM performance.
Local Wikipedia editions enhance regional and global knowledge coverage.
Abstract
Large Language Models (LLMs) are becoming a common way for humans to seek knowledge, yet their coverage and reliability vary widely. Especially for local language varieties, there are large asymmetries, e.g., information in local Wikipedia that is absent from the standard variant. However, little is known about how well LLMs perform under such information asymmetry, especially on closely related languages. We manually construct a novel challenge question-answering (QA) dataset that captures knowledge conveyed on a local Wikipedia page, which is absent from their higher-resource counterparts-covering Mandarin Chinese vs. Cantonese and German vs. Bavarian. Our experiments show that LLMs fail to answer questions about information only in local editions of Wikipedia. Providing context from lead sections substantially improves performance, with further gains possible via translation. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Wikis in Education and Collaboration
