Cross-Modal Contrastive Representation Learning for Audio-to-Image   Generation

HaeChun Chung; JooYong Shim; Jong-Kook Kim

arXiv:2207.12121·cs.SD·July 26, 2022

Cross-Modal Contrastive Representation Learning for Audio-to-Image Generation

HaeChun Chung, JooYong Shim, Jong-Kook Kim

PDF

Open Access

TL;DR

This paper introduces CMCRL, a novel method for cross-modal audio-to-image generation that leverages contrastive learning to improve the quality of generated images by extracting useful audio features.

Contribution

The paper proposes a new contrastive learning framework for audio-to-image generation, enhancing feature extraction and image quality over previous methods.

Findings

01

CMCRL improves image quality in audio-to-image generation.

02

Experimental results outperform previous approaches.

03

Contrastive learning effectively extracts useful features from audio data.

Abstract

Multiple modalities for certain information provide a variety of perspectives on that information, which can improve the understanding of the information. Thus, it may be crucial to generate data of different modality from the existing data to enhance the understanding. In this paper, we investigate the cross-modal audio-to-image generation problem and propose Cross-Modal Contrastive Representation Learning (CMCRL) to extract useful features from audios and use it in the generation phase. Experimental results show that CMCRL enhances quality of images generated than previous research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies