Risk Bounds on MDL Estimators for Linear Regression Models with Application to Simple ReLU Neural Networks
Yoshinari Takeishi, Jun'ichi Takeuchi

TL;DR
This paper establishes tight risk bounds for MDL estimators applied to simple two-layer ReLU neural networks, demonstrating that the redundancy is small and the risk bound is independent of the number of hidden layer parameters.
Contribution
The paper introduces a method to design two-stage codes for linear regression models and applies it to simple ReLU neural networks, deriving risk bounds independent of hidden layer size.
Findings
Risk bounds of order O(d^2 log n / n) for simple ReLU networks.
Redundancy of two-stage codes is small due to eigenvalue bias in Fisher information.
Risk bounds are independent of the number of hidden layer parameters m.
Abstract
To investigate the theoretical foundations of deep learning from the viewpoint of the minimum description length (MDL) principle, we analyse risk bounds of MDL estimators based on two-stage codes for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we propose a method to design two-stage codes for linear regression models and establish an upper bound on the risk of the corresponding MDL estimators based on the theory of MDL estimators originated by Barron and Cover (1991). Then, we apply this result to the simple two-layers NNs with ReLU activation which consist of nodes in the input layer, nodes in the hidden layer and one output node. Since the object of estimation is only the weights from the hidden layer to the output node in our setting, this is an example of linear regression models. As a result, we show that the redundancy of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
