Learning Curves for Mutual Information Maximization

Robert Urbanczik

arXiv:cond-mat/0305254·cond-mat.dis-nn·November 10, 2009

Learning Curves for Mutual Information Maximization

Robert Urbanczik

PDF

TL;DR

This paper analyzes an unsupervised learning method that maximizes mutual information between two networks' outputs, demonstrating its ability to recognize data structure and exploring learning dynamics and regularization strategies.

Contribution

It provides a theoretical analysis of mutual information maximization, including learning curves and regularization techniques, for different data models and network similarities.

Findings

01

Mutual information maximization recognizes data structure in large sample limit.

02

Learning convergence can be slow, especially for perceptron-like networks.

03

Regularization methods can improve learning efficiency.

Abstract

An unsupervised learning procedure based on maximizing the mutual information between the outputs of two networks receiving different but statistically dependent inputs is analyzed (Becker and Hinton, Nature, 355, 92, 161). For a generic data model, I show that in the large sample limit the structure in the data is recognized by mutual information maximization. For a more restricted model, where the networks are similar to perceptrons, I calculate the learning curves for zero-temperature Gibbs learning. These show that convergence can be rather slow, and a way of regularizing the procedure is considered.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.