Risk Bounds on MDL Estimators for Linear Regression Models with   Application to Simple ReLU Neural Networks

Yoshinari Takeishi; Jun'ichi Takeuchi

arXiv:2407.03854·cs.IT·November 19, 2024

Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks

Yoshinari Takeishi, Jun'ichi Takeuchi

PDF

Open Access

TL;DR

This paper establishes tight risk bounds for MDL estimators applied to simple two-layer ReLU neural networks, demonstrating that the redundancy is small and the risk bound is independent of the number of hidden layer parameters.

Contribution

The paper introduces a method to design two-stage codes for linear regression models and applies it to simple ReLU neural networks, deriving risk bounds independent of hidden layer size.

Findings

01

Risk bounds of order O(d^2 log n / n) for simple ReLU networks.

02

Redundancy of two-stage codes is small due to eigenvalue bias in Fisher information.

03

Risk bounds are independent of the number of hidden layer parameters m.

Abstract

To investigate the theoretical foundations of deep learning from the viewpoint of the minimum description length (MDL) principle, we analyse risk bounds of MDL estimators based on two-stage codes for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we propose a method to design two-stage codes for linear regression models and establish an upper bound on the risk of the corresponding MDL estimators based on the theory of MDL estimators originated by Barron and Cover (1991). Then, we apply this result to the simple two-layers NNs with ReLU activation which consist of $d$ nodes in the input layer, $m$ nodes in the hidden layer and one output node. Since the object of estimation is only the $m$ weights from the hidden layer to the output node in our setting, this is an example of linear regression models. As a result, we show that the redundancy of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications