C$^{2}$TC: A Training-Free Framework for Efficient Tabular Data Condensation

Sijia Xu; Fan Li; Xiaoyang Wang; Zhengyi Yang; Xuemin Lin

arXiv:2602.21717·cs.LG·February 26, 2026

C$^{2}$TC: A Training-Free Framework for Efficient Tabular Data Condensation

Sijia Xu, Fan Li, Xiaoyang Wang, Zhengyi Yang, Xuemin Lin

PDF

Open Access

TL;DR

C$^{2}$TC is a novel training-free framework for tabular data condensation that efficiently synthesizes small, informative datasets by jointly optimizing class allocation and feature representation, addressing computational challenges and data heterogeneity.

Contribution

It introduces the first training-free tabular dataset condensation method using class-adaptive clustering and a heuristic local search, significantly improving efficiency and effectiveness.

Findings

01

At least 100x faster than existing methods.

02

Achieves superior downstream task performance.

03

Effectively handles class imbalance and heterogeneous features.

Abstract

Tabular data is the primary data format in industrial relational databases, underpinning modern data analytics and decision-making. However, the increasing scale of tabular data poses significant computational and storage challenges to learning-based analytical systems. This highlights the need for data-efficient learning, which enables effective model training and generalization using substantially fewer samples. Dataset condensation (DC) has emerged as a promising data-centric paradigm that synthesizes small yet informative datasets to preserve data utility while reducing storage and training costs. However, existing DC methods are computationally intensive due to reliance on complex gradient-based optimization. Moreover, they often overlook key characteristics of tabular data, such as heterogeneous features and class imbalance. To address these limitations, we introduce C $^{2}$ TC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Imbalanced Data Classification Techniques