Learning from Multiple Sources for Data-to-Text and Text-to-Data

Song Duong; Alberto Lumbreras; Mike Gartrell; Patrick Gallinari

arXiv:2302.11269·cs.LG·February 23, 2023

Learning from Multiple Sources for Data-to-Text and Text-to-Data

Song Duong, Alberto Lumbreras, Mike Gartrell, Patrick Gallinari

PDF

Open Access 1 Repo

TL;DR

This paper introduces a variational auto-encoder model that learns from multiple heterogeneous sources to improve data-to-text and text-to-data tasks, addressing limitations of source-specific tuning and data scarcity.

Contribution

It proposes a novel model that jointly handles D2T and T2D tasks across multiple sources, capturing diversity with disentangled style and content variables.

Findings

01

Model closes performance gap with single-source systems.

02

Outperforms single-source models in some cases.

03

Effectively learns from non-parallel, multi-source corpora.

Abstract

Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa. These tasks are usually handled separately and use corpora extracted from a single source. Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks. This approach has two main limitations: first, a separate system has to be tuned for each task and source; second, learning is limited by the scarcity of available corpora. This paper considers a more general scenario where data are available from multiple heterogeneous sources. Each source, with its specific data format and semantic domain, provides a non-parallel corpus of text and structured data. We introduce a variational auto-encoder model with disentangled style and content variables that allows us to represent the diversity that stems from multiple sources of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sngdng/msunsupvae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management