Curriculum Learning for Cross-Lingual Data-to-Text Generation With Noisy   Data

Kancharla Aditya Hari; Manish Gupta; Vasudeva Varma

arXiv:2412.13484·cs.CL·December 19, 2024

Curriculum Learning for Cross-Lingual Data-to-Text Generation With Noisy Data

Kancharla Aditya Hari, Manish Gupta, Vasudeva Varma

PDF

Open Access

TL;DR

This paper introduces a curriculum learning approach for cross-lingual data-to-text generation with noisy data, improving output quality and faithfulness across multiple languages by using alignment-based sample ordering and annealing schedules.

Contribution

It proposes novel curriculum criteria tailored for cross-lingual DTG with noisy data, demonstrating significant performance improvements over existing methods.

Findings

01

BLEU score increased by up to 4 points

02

Faithfulness and coverage improved by 5-15%

03

Effective across 11 Indian languages and English

Abstract

Curriculum learning has been used to improve the quality of text generation systems by ordering the training samples according to a particular schedule in various tasks. In the context of data-to-text generation (DTG), previous studies used various difficulty criteria to order the training samples for monolingual DTG. These criteria, however, do not generalize to the crosslingual variant of the problem and do not account for noisy data. We explore multiple criteria that can be used for improving the performance of cross-lingual DTG systems with noisy data using two curriculum schedules. Using the alignment score criterion for ordering samples and an annealing schedule to train the model, we show increase in BLEU score by up to 4 points, and improvements in faithfulness and coverage of generations by 5-15% on average across 11 Indian languages and English in 2 separate datasets. We make…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems