Data-driven Weight Initialization with Sylvester Solvers
Debasmit Das, Yash Bhalgat, Fatih Porikli

TL;DR
This paper introduces a novel data-driven weight initialization method for deep neural networks using Sylvester equations, leading to improved performance especially in few-shot and fine-tuning scenarios.
Contribution
It presents a layer-wise initialization approach based on input activations, formulated as a Sylvester equation, offering a fast, gradient-free, and data-informed alternative to random initialization.
Findings
Improves performance over random initialization.
Effective in few-shot and fine-tuning settings.
Analyzes time complexity and latent code effects.
Abstract
In this work, we propose a data-driven scheme to initialize the parameters of a deep neural network. This is in contrast to traditional approaches which randomly initialize parameters by sampling from transformed standard distributions. Such methods do not use the training data to produce a more informed initialization. Our method uses a sequential layer-wise approach where each layer is initialized using its input activations. The initialization is cast as an optimization problem where we minimize a combination of encoding and decoding losses of the input activations, which is further constrained by a user-defined latent code. The optimization problem is then restructured into the well-known Sylvester equation, which has fast and efficient gradient-free solutions. Our data-driven method achieves a boost in performance compared to random initialization methods, both before start of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
