A survey on Self Supervised learning approaches for improving Multimodal   representation learning

Naman Goyal

arXiv:2210.11024·cs.LG·October 21, 2022·5 cites

A survey on Self Supervised learning approaches for improving Multimodal representation learning

Naman Goyal

PDF

Open Access

TL;DR

This survey reviews various self-supervised learning methods for multimodal representation learning, highlighting approaches like cross-modal generation, pretraining, cyclic translation, and unimodal label generation to enhance multimodal models.

Contribution

It provides a comprehensive overview of the state-of-the-art self-supervised techniques specifically tailored for multimodal learning, aggregating diverse methods from recent literature.

Findings

01

Identifies key self-supervised approaches for multimodal learning

02

Highlights the effectiveness of cross-modal pretraining and generation

03

Provides a structured taxonomy of methods

Abstract

Recently self supervised learning has seen explosive growth and use in variety of machine learning tasks because of its ability to avoid the cost of annotating large-scale datasets. This paper gives an overview for best self supervised learning approaches for multimodal learning. The presented approaches have been aggregated by extensive study of the literature and tackle the application of self supervised learning in different ways. The approaches discussed are cross modal generation, cross modal pretraining, cyclic translation, and generating unimodal labels in self supervised fashion.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Speech and dialogue systems