Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an   Algorithm

Semih Kaya; Elif Vural

arXiv:2006.02330·cs.LG·May 5, 2021

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

Semih Kaya, Elif Vural

PDF

TL;DR

This paper provides a theoretical analysis of multi-modal nonlinear embeddings, emphasizing the importance of interpolation regularity for generalization, and introduces an algorithm that improves multi-modal classification and retrieval tasks.

Contribution

It offers the first theoretical performance bounds for multi-modal nonlinear embeddings and proposes a novel algorithm that enforces Lipschitz regularity for better generalization.

Findings

01

Theoretical bounds highlight the importance of interpolation regularity.

02

The proposed algorithm improves multi-modal classification accuracy.

03

Experimental results show promising performance in image classification and retrieval.

Abstract

While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.