XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages
Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma,, Manish Gupta, Vasudeva Varma

TL;DR
This paper extends a multilingual dataset for fact-to-text generation to include four additional low-resource languages, and evaluates Transformer-based models, finding that a multi-lingual mT5 with fact-aware and structure-aware inputs performs best.
Contribution
The paper expands the XALIGN dataset to twelve languages, introduces an extensive evaluation of Transformer models, and identifies the most effective strategies for cross-lingual fact-to-text generation.
Findings
Multi-lingual mT5 with fact-aware embeddings performs best.
Extended dataset XALIGNV2 includes 12 languages.
Structure-aware input encoding improves generation quality.
Abstract
Multiple business scenarios require an automated generation of descriptive human-readable text from structured input data. Hence, fact-to-text generation systems have been developed for various downstream tasks like generating soccer reports, weather and financial reports, medical reports, person biographies, etc. Unfortunately, previous work on fact-to-text (F2T) generation has focused primarily on English mainly due to the high availability of relevant datasets. Only recently, the problem of cross-lingual fact-to-text (XF2T) was proposed for generation across multiple languages alongwith a dataset, XALIGN for eight languages. However, there has been no rigorous work on the actual XF2T generation problem. We extend XALIGN dataset with annotated data for four more languages: Punjabi, Malayalam, Assamese and Oriya. We conduct an extensive study using popular Transformer-based text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Adafactor · Residual Connection · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Gated Linear Unit
