MultiWiki: Interlingual Text Passage Alignment in Wikipedia
Simon Gottschalk, Elena Demidova

TL;DR
MultiWiki introduces a novel method for aligning interlingual Wikipedia article passages, enhancing cross-language understanding and analysis by balancing precision and overview through semantic similarity and greedy algorithms.
Contribution
The paper presents MultiWiki, a new approach that combines semantic similarity and greedy algorithms for effective interlingual text passage alignment in Wikipedia.
Findings
Achieved precise alignment results aligned with user annotations
Supported four language pairs in the demonstration
Collected a user-annotated benchmark for evaluation
Abstract
In this article we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences and build a basis for qualitative analysis of the articles. An important challenge in this context is the trade-off between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a…
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
