When is there a representer theorem? Vector versus matrix regularizers
Andreas Argyriou, Charles Micchelli, Massimiliano Pontil

TL;DR
This paper characterizes when a representer theorem applies to vector and matrix regularizers in machine learning, providing necessary and sufficient conditions and extending the theory to multi-task learning scenarios.
Contribution
It completes the characterization of kernel methods by proving the necessity of the nondecreasing inner product condition and extends the representer theorem to matrix regularizers for multi-task learning.
Findings
Necessary and sufficient conditions for vector regularizers to satisfy the representer theorem.
Extension of the representer theorem to matrix regularizers in multi-task learning.
Concrete examples illustrating the practical importance of the conditions.
Abstract
We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the inner product then the learned vector is a linear combination of the input data. This result, known as the {\em representer theorem}, is at the basis of kernel-based methods in machine learning. In this paper, we prove the necessity of the above condition, thereby completing the characterization of kernel methods based on regularization. We further extend our analysis to regularization methods which learn a matrix, a problem which is motivated by the application to multi-task learning. In this context, we study a more general representer theorem, which holds for a larger class of regularizers. We provide a necessary and sufficient condition for these class of matrix regularizers and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition
