Understanding the Properties of Generated Corpora

Naama Zwerdling; Segev Shlomov; Esther Goldbraich; George Kour; Boaz; Carmeli; Naama Tepper; Inbal Ronen; Vitaly Zabershinsky; Ateret Anaby-Tavor

arXiv:2206.11219·cs.CL·October 28, 2022

Understanding the Properties of Generated Corpora

Naama Zwerdling, Segev Shlomov, Esther Goldbraich, George Kour, Boaz, Carmeli, Naama Tepper, Inbal Ronen, Vitaly Zabershinsky, Ateret Anaby-Tavor

PDF

Open Access

TL;DR

This paper introduces tools to analyze the properties of automatically generated text corpora, revealing significant differences between leading generative models and enhancing understanding of their outputs.

Contribution

It presents novel tools for analyzing generated text corpora and applies them to compare different generative technologies, providing new insights into their properties.

Findings

01

Significant differences in generated corpora by different models

02

Tools reveal detailed corpus properties

03

Enhanced understanding of generative model outputs

Abstract

Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora. However, understanding the properties of an automatically generated text corpus remains challenging. We propose a set of tools that examine the properties of generated text corpora. Applying these tools on various generated corpora allowed us to gain new insights into the properties of the generative models. As part of our characterization process, we found remarkable differences in the corpora generated by two leading generative technologies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems