Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression
Joseph Shenouda, Rahul Parhi, Kangwook Lee, Robert D. Nowak

TL;DR
This paper develops a theoretical framework using vector-valued variation spaces to analyze multi-output neural networks, providing insights into multi-task learning and a new approach for network compression.
Contribution
It introduces vector-valued variation spaces, a representer theorem for shallow networks, and links weight decay to multi-task lasso, advancing understanding of multi-task learning and network compression.
Findings
Shallow networks solve data-fitting problems in infinite-dimensional spaces.
Norms in these spaces promote multi-task feature learning.
A convex method for deep network compression is proposed and evaluated.
Abstract
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from studying the regularization effect of weight decay in training networks with activations like the rectified linear unit (ReLU). This framework offers a deeper understanding of multi-output networks and their function-space characteristics. A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces. This representer theorem establishes that shallow vector-valued neural networks are the solutions to data-fitting problems over these infinite-dimensional spaces, where the network widths are bounded by the square of the number of training data. This observation reveals that the norm associated with these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsWeight Decay
