InfoSync: Information Synchronization across Multilingual   Semi-structured Tables

Siddharth Khincha; Chelsi Jain; Vivek Gupta; Tushar Kataria; Shuo; Zhang

arXiv:2307.03313·cs.CL·July 10, 2023·1 cites

InfoSync: Information Synchronization across Multilingual Semi-structured Tables

Siddharth Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria, Shuo, Zhang

PDF

Open Access

TL;DR

This paper introduces InfoSync, a new dataset and a two-step method for synchronizing semi-structured tables across multiple languages, improving data consistency in multilingual Wikipedia tables.

Contribution

The paper presents a novel dataset, InfoSyncC, and a two-step synchronization method for multilingual semi-structured tables, addressing a key challenge in cross-lingual data consistency.

Findings

01

Information alignment F1 score of 87.91 for en <-> non-en

02

77.28% acceptance rate in Wikipedia edits

03

Effective cross-lingual table synchronization demonstrated

Abstract

Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Biomedical Text Mining and Ontologies