TL;DR
This paper investigates the inference process of transformer-based tabular models, revealing depth redundancy and proposing a minimal single-layer model with comparable performance.
Contribution
It provides the first large-scale analysis of layerwise inference dynamics in tabular models and introduces a looped single-layer model that maintains accuracy.
Findings
Substantial depthwise redundancy in models.
Inference involves iterative refinement with overlapping computations.
Single-layer model achieves similar performance with 20% of parameters.
Abstract
Transformer-based tabular foundation models (TFMs) dominate small to medium tabular predictive benchmark tasks, yet their inference mechanisms remain largely unexplored. We present the first large-scale mechanistic study of layerwise dynamics in 6 state-of-the-art tabular in-context learning models. We explore how predictions emerge across depth, identify distinct stages of inference and reveal latent-space dynamics that differ from those of language models. Our findings indicate substantial depthwise redundancy across multiple models, suggesting iterative refinement with overlapping computations during inference stages. Guided by these insights, we design a proof-of-concept, looped single-layer model that uses only 20% of the original model's parameters while achieving comparable performance. The code is available at https://github.com/amirbalef/is_one_layer_enough.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
