Representation Learning with Conditional Information Flow Maximization
Dou Hu, Lingwei Wei, Wei Zhou, Songlin Hu

TL;DR
This paper introduces a novel information-theoretic framework called conditional information flow maximization that enhances language model representations by balancing information maximization and minimization to improve robustness, transferability, and task performance.
Contribution
It proposes a new representation learning method that maximizes mutual information with labels while minimizing redundant input features, addressing over-compression and feature redundancy issues.
Findings
Improves performance on 13 language understanding benchmarks.
Produces more sufficient, robust, and transferable representations.
Enhances generalization of pre-trained language models.
Abstract
This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) for the target task. Firstly, an information flow maximization principle is proposed to learn more sufficient representations for the input and target by simultaneously maximizing both input-representation and representation-label mutual information. Unlike the information bottleneck, we handle the input-representation information in an opposite way to avoid the over-compression issue of latent representations. Besides, to mitigate the negative effect of potential redundant features from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Data Stream Mining Techniques
