In-Context Bias Propagation in LLM-Based Tabular Data Generation

Pol G.Recasens; Alberto Gutierrez; Jordi Torres; Josep.Ll Berral; Javier Carnerero-Cano; Anisa Halimi; Kieran Fraser

arXiv:2506.09630·cs.LG·January 29, 2026

In-Context Bias Propagation in LLM-Based Tabular Data Generation

Pol G.Recasens, Alberto Gutierrez, Jordi Torres, Josep.Ll Berral, Javier Carnerero-Cano, Anisa Halimi, Kieran Fraser

PDF

Open Access

TL;DR

This paper investigates how biases in in-context examples influence the fairness and statistical properties of synthetic tabular data generated by LLMs, revealing vulnerabilities to bias propagation and adversarial manipulation.

Contribution

It systematically analyzes bias propagation in LLM-based data generation and proposes mitigation strategies to reduce disparity, highlighting a new vulnerability in sensitive domain applications.

Findings

01

Bias in in-context examples causes global statistical distortions.

02

Adversarial bias injection can compromise downstream fairness.

03

Preprocessing can mitigate but not eliminate bias effects.

Abstract

Large Language Models (LLMs) are increasingly used for synthetic tabular data generation through in-context learning (ICL), offering a practical solution for data augmentation in data scarce scenarios. While prior work has shown the potential of LLMs to improve downstream task performance through augmenting underrepresented groups, these benefits often assume access to a subset of unbiased in-context examples, representative of the real dataset. In real-world settings, however, data is frequently noisy and demographically skewed. In this paper, we systematically study how statistical biases within in-context examples propagate to the distribution of synthetic tabular data, showing that even mild in-context biases lead to global statistical distortions. We further introduce an adversarial scenario where a malicious contributor can inject bias into the synthetic dataset via a subset of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education