MotherNet: Fast Training and Inference via Hyper-Network Transformers
Andreas M\"uller, Carlo Curino, Raghu Ramakrishnan

TL;DR
MotherNet introduces a hypernetwork architecture that rapidly generates neural network weights for tabular data classification, achieving high efficiency and competitive accuracy without dataset-specific tuning.
Contribution
It presents MotherNet, a hypernetwork trained on synthetic tasks that can generate effective classifiers for new tabular datasets in a single forward pass, surpassing existing hypernetworks in flexibility and efficiency.
Findings
MotherNet outperforms neural networks trained with gradient descent on small datasets.
MotherNet's generated models are comparable to TabPFN and Gradient Boosting in accuracy.
MotherNet requires no fine-tuning or dataset-specific hyper-parameter tuning.
Abstract
Foundation models are transforming machine learning across many modalities, with in-context learning replacing classical model training. Recent work on tabular data hints at a similar opportunity to build foundation models for classification for numerical data. However, existing meta-learning approaches can not compete with tree-based methods in terms of inference time. In this paper, we propose MotherNet, a hypernetwork architecture trained on synthetic classification tasks that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network by in-context learning using a single forward pass. In contrast to most existing hypernetworks that are usually trained for relatively constrained multi-task settings, MotherNet can create models for multiclass classification on arbitrary tabular datasets without any dataset specific gradient…
Peer Reviews
Decision·ICLR 2025 Poster
**Strengths**: - **Creative Approach**: MotherNet is an inventive application of hypernetworks and transformer-based architectures, demonstrating how in-context learning can effectively generate task-specific models without gradient descent. - **Efficiency and Speed**: The method achieves significant inference speed improvements over TabPFN, offering practical advantages for use cases requiring fast, on-demand predictions. - **Eliminates Hyper-Parameter Tuning**: MotherNet operates without per-d
- **Scalability Constraints**: MotherNet, like TabPFN, is bound by the quadratic memory requirements of transformers, limiting its usability for datasets larger than around 5,000 samples. This could be viewed as a major limitation for more extensive applications. - **Presentation Gaps**: Some sections, particularly in the methodology and results, could benefit from improved clarity and structure to enhance readability and understanding. - **Comparison Depth**: Although the evaluation includes mu
- The fact that this approach works well is surprising and compelling. It appears that training on synthetic datasets has a useful regularizing effect in generating MLPs compared to standard MLPs that are just trained on the dataset in question. - The presentation is generally very clear.
- The experimental results aren't very compelling on the whole. The CC-18 evaluation is limited to very small datasets and the actual differences in AUC between the top ten models are very small. Given the scale of the data and relative effectiveness of very simple models like logistic regression, it's tough to see the computational efficiency of the proposed model as actually mattering in practice, especially since it's using GPU hardware. The Tabzilla evaluation provides a wider range of datas
I like the overall idea of the project. This paper shows an interesting new path with certain benefits over the original TabPFN, including the high inference efficiency. *(Assuming that the authors will share the code and the model checkpoints linked in the paper)* To me, a big positive thing is the provided code for training MotherNet and the model weights. Training hypernetworks is non-trivial and costly, and I like that this project gives the community a ready-to-use hypernetwork baseline.
**Benchmarks** I have mixed feelings about the benchmarks. I tend to agree that MotherNet outperforms HyperFast, however, the overall ranking of models is less convincing to me. I can imagine that the relative performance of the methods will change significantly in a different setup, and especially on different datasets. Regarding datasets, if I understand correctly, there are three parts of results: - (Figure 2 and Table 1) The performance on datasets with unusual ranking of models, in parti
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training · tabular data Prior-data Fitted Network · HyperNetwork
