Sparse tree-based initialization for neural networks
Patrick Lutz (BU), Ludovic Arnould (LPSM (UMR\_8001)), Claire Boyer, (LPSM (UMR\_8001)), Erwan Scornet (CMAP)

TL;DR
This paper introduces a sparse initialization method for neural networks based on tree ensemble insights, significantly improving performance on tabular data and rivaling gradient boosting methods.
Contribution
It presents a novel tree-based initialization technique for neural networks that enhances their performance on tabular data and maintains sparsity in weights.
Findings
Outperforms default MLP initialization on tabular datasets
Rivals complex deep learning solutions and gradient boosting
Preserves sparsity, acting as implicit regularization
Abstract
Dedicated neural network (NN) architectures have been designed to handle specific data types (such as CNN for images or RNN for text), which ranks them among state-of-the-art methods for dealing with these data. Unfortunately, no architecture has been found for dealing with tabular data yet, for which tree ensemble methods (tree boosting, random forests) usually show the best predictive performances. In this work, we propose a new sparse initialization technique for (potentially deep) multilayer perceptrons (MLP): we first train a tree-based procedure to detect feature interactions and use the resulting information to initialize the network, which is subsequently trained via standard stochastic gradient strategies. Numerical experiments on several tabular data sets show that this new, simple and easy-to-use method is a solid concurrent, both in terms of generalization capacity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition · Machine Learning and ELM
