Target-Oriented Deformation of Visual-Semantic Embedding Space

Takashi Matsubara

arXiv:1910.06514·cs.CV·October 16, 2019·1 cites

Target-Oriented Deformation of Visual-Semantic Embedding Space

Takashi Matsubara

PDF

Open Access 1 Repo

TL;DR

This paper introduces TOD-Net, a post-processing module that deforms the embedding space to improve cross-modal retrieval by emphasizing entity-specific concepts and handling diversity.

Contribution

The paper presents TOD-Net, a novel deformation network that enhances existing multimodal embeddings for better cross-modal retrieval performance.

Findings

01

Achieves state-of-the-art results on MSCOCO dataset.

02

Effectively emphasizes entity-specific concepts.

03

Handles higher diversity in retrieval tasks.

Abstract

Multimodal embedding is a crucial research topic for cross-modal understanding, data mining, and translation. Many studies have attempted to extract representations from given entities and align them in a shared embedding space. However, because entities in different modalities exhibit different abstraction levels and modality-specific information, it is insufficient to embed related entities close to each other. In this study, we propose the Target-Oriented Deformation Network (TOD-Net), a novel module that continuously deforms the embedding space into a new space under a given condition, thereby adjusting similarities between entities. Unlike methods based on cross-modal attention, TOD-Net is a post-process applied to the embedding space learned by existing embedding systems and improves their performances of retrieval. In particular, when combined with cutting-edge models, TOD-Net…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fartashf/vsepp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques