XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages
Tushar Abhishek, Shivprasad Sagare, Bhavyajeet Singh, Anubhav Sharma,, Manish Gupta, Vasudeva Varma

TL;DR
XAlign introduces a new dataset and unsupervised methods for cross-lingual fact-to-text generation in low-resource languages, addressing a gap in multilingual natural language generation for underrepresented languages.
Contribution
The paper presents the first cross-lingual fact-to-text alignment and generation methods for low-resource languages, including a large dataset and baseline models.
Findings
XAlign dataset contains 0.45M pairs across 8 languages.
Manual annotation of 5402 pairs enhances alignment quality.
Baseline models demonstrate the feasibility of cross-lingual generation.
Abstract
Multiple critical scenarios (like Wikipedia text generation given English Infoboxes) need automated generation of descriptive text in low resource (LR) languages from English fact triples. Previous work has focused on English fact-to-text (F2T) generation. To the best of our knowledge, there has been no previous attempt on cross-lingual alignment or generation for LR languages. Building an effective cross-lingual F2T (XF2T) system requires alignment between English structured facts and LR sentences. We propose two unsupervised methods for cross-lingual alignment. We contribute XALIGN, an XF2T dataset with 0.45M pairs across 8 languages, of which 5402 pairs have been manually annotated. We also train strong baseline XF2T generation models on the XAlign dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
