InfoSync: Information Synchronization across Multilingual Semi-structured Tables
Siddharth Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria, Shuo, Zhang

TL;DR
This paper introduces InfoSync, a new dataset and a two-step method for synchronizing semi-structured tables across multiple languages, improving data consistency in multilingual Wikipedia tables.
Contribution
The paper presents a novel dataset, InfoSyncC, and a two-step synchronization method for multilingual semi-structured tables, addressing a key challenge in cross-lingual data consistency.
Findings
Information alignment F1 score of 87.91 for en <-> non-en
77.28% acceptance rate in Wikipedia edits
Effective cross-lingual table synchronization demonstrated
Abstract
Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Biomedical Text Mining and Ontologies
