TL;DR
This paper introduces RDDG, a novel framework that uses Bayesian calibration and self-reinforcing feedback to generate high-quality, relational, and structured data for imbalanced classification tasks, improving over existing methods.
Contribution
The work presents a unified in-context learning framework with a feedback mechanism for continuous quality improvement in relational data synthesis, addressing gaps in current LLM-based approaches.
Findings
RDDG outperforms existing methods in data fidelity.
RDDG improves downstream imbalanced classification performance.
The framework effectively preserves attribute correlations in generated data.
Abstract
Imbalanced data are commonly present in real-world applications. While data synthesis can effectively mitigate data scarcity for rare classes, and LLMs have revolutionized text generation, the application of LLMs to the synthesis of relational/structured tabular data remains underexplored. Moreover, existing approaches lack an effective feedback mechanism to guide LLMs in continuously optimizing the quality of the generated data throughout the synthesis process. In this work, we propose RDDG, Relational Data generator with Dynamic Guidance, which is a unified in-context learning framework that employs progressive chain-of-thought (CoT) steps to generate tabular data for enhancing downstream imbalanced classification performance. RDDG first uses core set selection to identify representative samples from the original data, then utilizes in-context learning to discover the inherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
