A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models
Yefeng Yuan, Yuhong Liu, Liang Cheng

TL;DR
This paper introduces SynEval, a comprehensive evaluation framework for assessing the quality, utility, and privacy of synthetic tabular data generated by large language models, validated on product review datasets.
Contribution
The paper presents SynEval, a novel open-source framework that quantitatively evaluates synthetic data from LLMs, addressing the lack of comprehensive assessment tools.
Findings
SynEval effectively measures fidelity, utility, and privacy of synthetic data.
Trade-offs between evaluation metrics are highlighted in synthetic data quality.
SynEval aids researchers in selecting suitable synthetic data for specific applications.
Abstract
The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absence of a comprehensive evaluation framework capable of quantitatively measuring the quality of the generated synthetic data and their utility for downstream tasks. In response to this gap, we introduce SynEval, an open-source evaluation framework designed to assess the fidelity, utility, and privacy preservation of synthetically generated tabular data via a suite of diverse evaluation metrics. We validate the efficacy of our proposed framework - SynEval - by applying it to synthetic product…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management
