Mutual Information Learned Regressor: an Information-theoretic Viewpoint   of Training Regression Systems

Jirong Yi; Qiaosheng Zhang; Zhen Chen; Qiao Liu; Wei Shao; Yusen He,; Yaohua Wang

arXiv:2211.12685·stat.ML·November 24, 2022

Mutual Information Learned Regressor: an Information-theoretic Viewpoint of Training Regression Systems

Jirong Yi, Qiaosheng Zhang, Zhen Chen, Qiao Liu, Wei Shao, Yusen He,, Yaohua Wang

PDF

Open Access

TL;DR

This paper introduces a mutual information-based framework for regression, providing theoretical convergence analysis and demonstrating that high-dimensional data can be advantageous under this approach.

Contribution

It extends mutual information supervised learning to regression, offers a convergence analysis for SGD, and derives a generalization bound highlighting benefits of high dimensionality.

Findings

01

MSE minimization is equivalent to conditional entropy learning.

02

Proposed mutual information formulation for regression with reparameterization.

03

High dimensionality can improve generalization performance.

Abstract

As one of the central tasks in machine learning, regression finds lots of applications in different fields. An existing common practice for solving regression problems is the mean square error (MSE) minimization approach or its regularized variants which require prior knowledge about the models. Recently, Yi et al., proposed a mutual information based supervised learning framework where they introduced a label entropy regularization which does not require any prior knowledge. When applied to classification tasks and solved via a stochastic gradient descent (SGD) optimization algorithm, their approach achieved significant improvement over the commonly used cross entropy loss and its variants. However, they did not provide a theoretical convergence analysis of the SGD algorithm for the proposed formulation. Besides, applying the framework to regression tasks is nontrivial due to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning

MethodsStochastic Gradient Descent · Entropy Regularization