Understanding Encoder-Decoder Structures in Machine Learning Using Information Measures
Jorge F. Silva, Victor Faraggi, Camilo Ramirez, Alvaro Egana, and Eduardo Pavez

TL;DR
This paper uses information theory to analyze encoder-decoder structures in machine learning, providing new models, characterizations, and insights into their expressive power and design principles.
Contribution
It introduces an information-theoretic framework with IS and MIL concepts to characterize and evaluate encoder-decoder models in ML.
Findings
Characterizes probabilistic models consistent with IS encoder-decoder structures.
Quantifies performance loss due to encoder-decoder design using mutual information loss.
Establishes conditions for universal cross-entropy learning with encoder-decoder architectures.
Abstract
We present new results to model and understand the role of encoder-decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss (MIL), to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder-decoder latent predictive structure. This result formally justifies the encoder-decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
