Data-driven Weight Initialization with Sylvester Solvers

Debasmit Das; Yash Bhalgat; Fatih Porikli

arXiv:2105.10335·cs.NE·May 24, 2021·1 cites

Data-driven Weight Initialization with Sylvester Solvers

Debasmit Das, Yash Bhalgat, Fatih Porikli

PDF

Open Access

TL;DR

This paper introduces a novel data-driven weight initialization method for deep neural networks using Sylvester equations, leading to improved performance especially in few-shot and fine-tuning scenarios.

Contribution

It presents a layer-wise initialization approach based on input activations, formulated as a Sylvester equation, offering a fast, gradient-free, and data-informed alternative to random initialization.

Findings

01

Improves performance over random initialization.

02

Effective in few-shot and fine-tuning settings.

03

Analyzes time complexity and latent code effects.

Abstract

In this work, we propose a data-driven scheme to initialize the parameters of a deep neural network. This is in contrast to traditional approaches which randomly initialize parameters by sampling from transformed standard distributions. Such methods do not use the training data to produce a more informed initialization. Our method uses a sequential layer-wise approach where each layer is initialized using its input activations. The initialization is cast as an optimization problem where we minimize a combination of encoding and decoding losses of the input activations, which is further constrained by a user-defined latent code. The optimization problem is then restructured into the well-known Sylvester equation, which has fast and efficient gradient-free solutions. Our data-driven method achieves a boost in performance compared to random initialization methods, both before start of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications