Diachronic Cross-modal Embeddings

David Semedo; Jo\~ao Magalh\~aes

arXiv:1909.13689·cs.MM·October 1, 2019

Diachronic Cross-modal Embeddings

David Semedo, Jo\~ao Magalh\~aes

PDF

TL;DR

This paper introduces a diachronic cross-modal embedding (DCM) that captures temporal semantic shifts in multimodal data, enabling better organization and understanding of visual-textual interactions over time.

Contribution

The paper proposes a novel neural architecture and ranking loss for embedding cross-modal data across time, preserving semantic similarity and temporal alignment.

Findings

01

DCM effectively organizes multimodal instances over time.

02

DCM preserves semantic cross-modal correlations at each time point.

03

Qualitative results suggest improved browsing and understanding of multimodal content.

Abstract

Understanding the semantic shifts of multimodal information is only possible with models that capture cross-modal interactions over time. Under this paradigm, a new embedding is needed that structures visual-textual interactions according to the temporal dimension, thus, preserving data's original temporal organisation. This paper introduces a novel diachronic cross-modal embedding (DCM), where cross-modal correlations are represented in embedding space, throughout the temporal dimension, preserving semantic similarity at each instant t. To achieve this, we trained a neural cross-modal architecture, under a novel ranking loss strategy, that for each multimodal instance, enforces neighbour instances' temporal alignment, through subspace structuring constraints based on a temporal alignment window. Experimental results show that our DCM embedding successfully organises instances over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.