T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
Hugo Thimonier, Jos\'e Lucas De Melo Costa, Fabrice Popineau, Arpad, Rimmel, Bich-Li\^en Doan

TL;DR
T-JEPA introduces an augmentation-free self-supervised learning method for tabular data, leveraging latent space predictions to learn meaningful representations without data augmentation, leading to improved downstream task performance.
Contribution
The paper proposes T-JEPA, a novel augmentation-free SSL approach for tabular data using latent space predictions, and introduces regularization tokens for effective training.
Findings
Outperforms traditional models on classification and regression tasks.
Enables some methods to match or surpass Gradient Boosted Decision Trees.
Effectively identifies relevant features without label access.
Abstract
Self-supervision is often used for pre-training to foster performance on a downstream task by constructing meaningful representations of samples. Self-supervised learning (SSL) generally involves generating different views of the same sample and thus requires data augmentations that are challenging to construct for tabular data. This constitutes one of the main challenges of self-supervision for structured data. In the present work, we propose a novel augmentation-free SSL method for tabular data. Our approach, T-JEPA, relies on a Joint Embedding Predictive Architecture (JEPA) and is akin to mask reconstruction in the latent space. It involves predicting the latent representation of one subset of features from the latent representation of a different subset within the same sample, thereby learning rich representations without augmentations. We use our method as a pre-training technique…
Peer Reviews
Decision·ICLR 2025 Poster
The authors' idea of applying JEPA for representation learning on tabular data is original. The work is also well-supported by extensive experiments. In general, coming up with a masking strategy to make JEPA work for tabular data constitutes a strong engineering contribution. Tabular data, unlike the other data modalities such as vision, speech, or text, is highly heterogeneous and complex: each feature has its own distribution, some might be categorical while other continuous with varying degr
I am not convinced about the contributions of Sections 5.1 and 5.3. These sections look artificial and drawn-out to me. ## Comments on Section 5.1 The proposed metrics in Section 5.1 have little to do with the optimized objective, Equation 5. Therefore, I have no intuition of what we want these metrics to be: what is their range? what values are good / bad? how fast / slow do we want them to converge? why do we think optimizing our objective promotes this? The authors' arguments such as > Line
1. The introduction of a regularization token is an interesting idea. The authors conduct thorough ablation studies to validate its effectiveness. I appreciate that this technique is not only technically sound but also produces convincing experimental results. 2. The experimental design is comprehensive. The paper evaluates its approach across multiple dimensions: (1) testing on various backbones with and without T-JEPA, (2) comparing backbone+T-JEPA with backbone+PTaRL, (3) benchmarking against
1. The paper lacks detailed information about the model architectures used in the main experiments. For example, does ResNet refer to ResNet-18 or ResNet-50? Providing the exact specifications of the hyperparameters can enhance the reproducibility of the experiments. 2. When compared to PTaRL, which is also enhancement for tabular learning, the performance gain is not competitive. Although the idea is novel, the lackluster performance severely limits the contribution of this paper. The authors a
1. The proposed SSL method is model-agnostic and could be coupled with various deep tabular models, which has wide range application. 2. The strategy for masking representations is novel in tabular domain. 3. The experiment is extensive.
Please refer to the questions below.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Machine Learning and Data Classification · Neural Networks and Applications
