Self-Reinforcing Controllable Synthesis of Rare Relational Data via Bayesian Calibration

Chongsheng Zhang; Hao Wang; Zelong Yu; Esteban Garces Arias; Julian Rodemann; Zhanshuo Zhang; Qilong Li; Gaojuan Fan; Krikamol Muandet; Christian Heumann

arXiv:2604.16817·cs.LG·April 28, 2026

Self-Reinforcing Controllable Synthesis of Rare Relational Data via Bayesian Calibration

Chongsheng Zhang, Hao Wang, Zelong Yu, Esteban Garces Arias, Julian Rodemann, Zhanshuo Zhang, Qilong Li, Gaojuan Fan, Krikamol Muandet, Christian Heumann

PDF

1 Repo

TL;DR

This paper introduces RDDG, a novel framework that uses Bayesian calibration and self-reinforcing feedback to generate high-quality, relational, and structured data for imbalanced classification tasks, improving over existing methods.

Contribution

The work presents a unified in-context learning framework with a feedback mechanism for continuous quality improvement in relational data synthesis, addressing gaps in current LLM-based approaches.

Findings

01

RDDG outperforms existing methods in data fidelity.

02

RDDG improves downstream imbalanced classification performance.

03

The framework effectively preserves attribute correlations in generated data.

Abstract

Imbalanced data are commonly present in real-world applications. While data synthesis can effectively mitigate data scarcity for rare classes, and LLMs have revolutionized text generation, the application of LLMs to the synthesis of relational/structured tabular data remains underexplored. Moreover, existing approaches lack an effective feedback mechanism to guide LLMs in continuously optimizing the quality of the generated data throughout the synthesis process. In this work, we propose RDDG, Relational Data generator with Dynamic Guidance, which is a unified in-context learning framework that employs progressive chain-of-thought (CoT) steps to generate tabular data for enhancing downstream imbalanced classification performance. RDDG first uses core set selection to identify representative samples from the original data, then utilizes in-context learning to discover the inherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cszhangLMU/RDDG
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.