Mutual Information Learned Regressor: an Information-theoretic Viewpoint of Training Regression Systems
Jirong Yi, Qiaosheng Zhang, Zhen Chen, Qiao Liu, Wei Shao, Yusen He,, Yaohua Wang

TL;DR
This paper introduces a mutual information-based framework for regression, providing theoretical convergence analysis and demonstrating that high-dimensional data can be advantageous under this approach.
Contribution
It extends mutual information supervised learning to regression, offers a convergence analysis for SGD, and derives a generalization bound highlighting benefits of high dimensionality.
Findings
MSE minimization is equivalent to conditional entropy learning.
Proposed mutual information formulation for regression with reparameterization.
High dimensionality can improve generalization performance.
Abstract
As one of the central tasks in machine learning, regression finds lots of applications in different fields. An existing common practice for solving regression problems is the mean square error (MSE) minimization approach or its regularized variants which require prior knowledge about the models. Recently, Yi et al., proposed a mutual information based supervised learning framework where they introduced a label entropy regularization which does not require any prior knowledge. When applied to classification tasks and solved via a stochastic gradient descent (SGD) optimization algorithm, their approach achieved significant improvement over the commonly used cross entropy loss and its variants. However, they did not provide a theoretical convergence analysis of the SGD algorithm for the proposed formulation. Besides, applying the framework to regression tasks is nontrivial due to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsStochastic Gradient Descent · Entropy Regularization
