Generative Adversarial Nets for Multiple Text Corpora
Baiyang Wang, Diego Klabjan

TL;DR
This paper explores the use of GANs to generate cross-corpus word and document embeddings from multiple text corpora, improving supervised learning tasks.
Contribution
It introduces novel GAN models tailored for multiple text corpora, enabling consistent cross-corpus embeddings and robust document representations.
Findings
GANs improve cross-corpus word embedding consistency
GAN-generated document embeddings enhance supervised learning
Models demonstrate effectiveness on real-world datasets
Abstract
Generative adversarial nets (GANs) have been successfully applied to the artificial generation of image data. In terms of text data, much has been done on the artificial generation of natural language from a single corpus. We consider multiple text corpora as the input data, for which there can be two applications of GANs: (1) the creation of consistent cross-corpus word embeddings given different word embeddings per corpus; (2) the generation of robust bag-of-words document embeddings for each corpora. We demonstrate our GAN models on real-world text data sets from different corpora, and show that embeddings from both models lead to improvements in supervised learning problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
