On the effective transfer of knowledge from English to Hindi Wikipedia
Paramita Das, Amartya Roy, Ritabrata Chakraborty, Animesh Mukherjee

TL;DR
This paper presents a lightweight framework that improves Hindi Wikipedia content by transferring knowledge from English, utilizing external resources and large language models to generate and adapt content, significantly reducing content gaps.
Contribution
The paper introduces a novel framework combining external resource extraction, content adaptation, and machine translation to enhance Hindi Wikipedia articles from English sources.
Findings
Hindi Wikipedia articles increased by 65% and 62% in content quality.
Framework effectively adapts and translates content, improving coverage.
Both automatic and human evaluations confirm the quality improvements.
Abstract
Although Wikipedia is the largest multilingual encyclopedia, it remains inherently incomplete. There is a significant disparity in the quality of content between high-resource languages (HRLs, e.g., English) and low-resource languages (LRLs, e.g., Hindi), with many LRL articles lacking adequate information. To bridge these content gaps, we propose a lightweight framework to enhance knowledge equity between English and Hindi. In case the English Wikipedia page is not up-to-date, our framework extracts relevant information from external resources readily available (such as English books) and adapts it to align with Wikipedia's distinctive style, including its \textit{neutral point of view} (NPOV) policy, using in-context learning capabilities of large language models. The adapted content is then machine-translated into Hindi for integration into the corresponding Wikipedia articles. On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices
MethodsALIGN
