Learning from Multiple Sources for Data-to-Text and Text-to-Data
Song Duong, Alberto Lumbreras, Mike Gartrell, Patrick Gallinari

TL;DR
This paper introduces a variational auto-encoder model that learns from multiple heterogeneous sources to improve data-to-text and text-to-data tasks, addressing limitations of source-specific tuning and data scarcity.
Contribution
It proposes a novel model that jointly handles D2T and T2D tasks across multiple sources, capturing diversity with disentangled style and content variables.
Findings
Model closes performance gap with single-source systems.
Outperforms single-source models in some cases.
Effectively learns from non-parallel, multi-source corpora.
Abstract
Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa. These tasks are usually handled separately and use corpora extracted from a single source. Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks. This approach has two main limitations: first, a separate system has to be tuned for each task and source; second, learning is limited by the scarcity of available corpora. This paper considers a more general scenario where data are available from multiple heterogeneous sources. Each source, with its specific data format and semantic domain, provides a non-parallel corpus of text and structured data. We introduce a variational auto-encoder model with disentangled style and content variables that allows us to represent the diversity that stems from multiple sources of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
