Multimodal Representation Learning via Maximization of Local Mutual   Information

Ruizhi Liao; Daniel Moyer; Miriam Cha; Keegan Quigley; Seth Berkowitz,; Steven Horng; Polina Golland; William M. Wells

arXiv:2103.04537·eess.IV·December 16, 2021

Multimodal Representation Learning via Maximization of Local Mutual Information

Ruizhi Liao, Daniel Moyer, Miriam Cha, Keegan Quigley, Seth Berkowitz,, Steven Horng, Polina Golland, William M. Wells

PDF

1 Repo

TL;DR

This paper introduces a method for learning image representations by maximizing local mutual information between image features and associated text, leveraging recent neural estimation techniques to improve image classification tasks.

Contribution

It presents a novel approach that maximizes local mutual information between image and text features, enhancing image representation learning using free text descriptions.

Findings

01

Improved image classification performance with local mutual information maximization.

02

Demonstrated advantages over global mutual information approaches.

03

Effective use of neural network discriminators for mutual information estimation.

Abstract

We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Our method trains image and text encoders by encouraging the resulting representations to exhibit high local mutual information. We make use of recent advances in mutual information estimation with neural network discriminators. We argue that the sum of local mutual information is typically a lower bound on the global mutual information. Our experimental results in the downstream image classification tasks demonstrate the advantages of using local features for image-text representation learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RayRuizhiLiao/mutual_info_img_txt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.