Understanding the Properties of Generated Corpora
Naama Zwerdling, Segev Shlomov, Esther Goldbraich, George Kour, Boaz, Carmeli, Naama Tepper, Inbal Ronen, Vitaly Zabershinsky, Ateret Anaby-Tavor

TL;DR
This paper introduces tools to analyze the properties of automatically generated text corpora, revealing significant differences between leading generative models and enhancing understanding of their outputs.
Contribution
It presents novel tools for analyzing generated text corpora and applies them to compare different generative technologies, providing new insights into their properties.
Findings
Significant differences in generated corpora by different models
Tools reveal detailed corpus properties
Enhanced understanding of generative model outputs
Abstract
Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora. However, understanding the properties of an automatically generated text corpus remains challenging. We propose a set of tools that examine the properties of generated text corpora. Applying these tools on various generated corpora allowed us to gain new insights into the properties of the generative models. As part of our characterization process, we found remarkable differences in the corpora generated by two leading generative technologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
