Understanding Encoder-Decoder Structures in Machine Learning Using   Information Measures

Jorge F. Silva; Victor Faraggi; Camilo Ramirez; Alvaro Egana; and Eduardo Pavez

arXiv:2405.20452·cs.LG·June 3, 2024·1 cites

Understanding Encoder-Decoder Structures in Machine Learning Using Information Measures

Jorge F. Silva, Victor Faraggi, Camilo Ramirez, Alvaro Egana, and Eduardo Pavez

PDF

Open Access

TL;DR

This paper uses information theory to analyze encoder-decoder structures in machine learning, providing new models, characterizations, and insights into their expressive power and design principles.

Contribution

It introduces an information-theoretic framework with IS and MIL concepts to characterize and evaluate encoder-decoder models in ML.

Findings

01

Characterizes probabilistic models consistent with IS encoder-decoder structures.

02

Quantifies performance loss due to encoder-decoder design using mutual information loss.

03

Establishes conditions for universal cross-entropy learning with encoder-decoder architectures.

Abstract

We present new results to model and understand the role of encoder-decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss (MIL), to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder-decoder latent predictive structure. This result formally justifies the encoder-decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications