Generative Adversarial Nets for Multiple Text Corpora

Baiyang Wang; Diego Klabjan

arXiv:1712.09127·cs.CL·December 27, 2017

Generative Adversarial Nets for Multiple Text Corpora

Baiyang Wang, Diego Klabjan

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of GANs to generate cross-corpus word and document embeddings from multiple text corpora, improving supervised learning tasks.

Contribution

It introduces novel GAN models tailored for multiple text corpora, enabling consistent cross-corpus embeddings and robust document representations.

Findings

01

GANs improve cross-corpus word embedding consistency

02

GAN-generated document embeddings enhance supervised learning

03

Models demonstrate effectiveness on real-world datasets

Abstract

Generative adversarial nets (GANs) have been successfully applied to the artificial generation of image data. In terms of text data, much has been done on the artificial generation of natural language from a single corpus. We consider multiple text corpora as the input data, for which there can be two applications of GANs: (1) the creation of consistent cross-corpus word embeddings given different word embeddings per corpus; (2) the generation of robust bag-of-words document embeddings for each corpora. We demonstrate our GAN models on real-world text data sets from different corpora, and show that embeddings from both models lead to improvements in supervised learning problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baiyangwang/emgan
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications