
TL;DR
This paper compares mixture-of-experts (MoE) models with traditional MLP neural networks for tabular data, showing that GG MoE achieves higher performance with fewer parameters than MLPs across numerous datasets.
Contribution
Introduces GG MoE, a novel mixture-of-experts model with Gumbel-Softmax gating, demonstrating superior performance and efficiency over MLPs on tabular data.
Findings
GG MoE outperforms MLPs on 38 datasets
GG MoE uses fewer parameters than MLPs
Ensembles of MLPs are less efficient than GG MoE
Abstract
In recent years, significant efforts have been directed toward adapting modern neural network architectures for tabular data. However, despite their larger number of parameters and longer training and inference times, these models often fail to consistently outperform vanilla multilayer perceptron (MLP) neural networks. Moreover, MLP-based ensembles have recently demonstrated superior performance and efficiency compared to advanced deep learning methods. Therefore, rather than focusing on building deeper and more complex deep learning models, we propose investigating whether MLP neural networks can be replaced with more efficient architectures without sacrificing performance. In this paper, we first introduce GG MoE, a mixture-of-experts (MoE) model with a Gumbel-Softmax gating function. We then demonstrate that GG MoE with an embedding layer achieves the highest performance across …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Data Mining Algorithms and Applications · Algorithms and Data Compression
