TL;DR
This paper introduces L2C2, a reinforcement learning framework that optimizes data cleaning for tabular foundation models by aligning real-world data with synthetic priors, improving accuracy and calibration.
Contribution
It presents the first deep RL approach for prior alignment in tabular data cleaning, demonstrating improved model performance and transferability across datasets.
Findings
Reward engineering is complex; some reward designs lead to trivial cleaning.
The TFMAwareReward improves pipeline selection and accuracy on certain datasets.
Parameterized cleaning actions enhance reward outcomes in most datasets.
Abstract
Tabular Foundation Models (TFMs) achieve state-of-the-art zero-shot accuracy on small tabular datasets by meta-learning over synthetic data-generating processes -- making them highly attractive for practitioners who cannot afford large annotated corpora. However, their in-context learning mechanism assumes approximately clean inputs: missing values, outliers, and duplicates in the real-world data create a prior mismatch that degrades both accuracy and confidence calibration simultaneously. Correcting this mismatch requires sequential decisions over cleaning operators whose interactions no static preprocessing rule can anticipate -a natural fit for reinforcement learning~(RL). We introduce L2C2, the first deep RL framework framing tabular data cleaning as prior alignment: a learned policy sequences operators to minimize the distributional gap between dirty input and the TFM's synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
