Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation
Gwangbeen Park, Woobin Im

TL;DR
This paper introduces a novel adversarial backpropagation method for image-text multi-modal representation learning that does not rely on image-text pair data, achieving universal semantic features.
Contribution
It is the first to apply adversarial learning to multi-modal learning without using image-text pairs, enabling universal semantic feature extraction.
Findings
Multi-modal features can be learned without image-text pairs.
The method produces features with more similar distributions across modalities.
Features contain universal semantic information even when trained only for category prediction.
Abstract
We present novel method for image-text multi-modal representation learning. In our knowledge, this work is the first approach of applying adversarial learning concept to multi-modal learning and not exploiting image-text pair information to learn multi-modal feature. We only use category information in contrast with most previous methods using image-text pair information for multi-modal embedding. In this paper, we show that multi-modal feature can be achieved without image-text pair information and our method makes more similar distribution with image and text in multi-modal feature space than other methods which use image-text pair information. And we show our multi-modal feature has universal semantic information, even though it was trained for category prediction. Our model is end-to-end backpropagation, intuitive and easily extended to other multi-modal learning work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
