(Un)likelihood Training for Interpretable Embedding

Jiaxin Wu; Chong-Wah Ngo; Wing-Kwong Chan; Zhijian Hou

arXiv:2207.00282·cs.CV·November 13, 2023·1 cites

(Un)likelihood Training for Interpretable Embedding

Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan, Zhijian Hou

PDF

Open Access 1 Repo

TL;DR

This paper introduces likelihood and unlikelihood training objectives to enhance interpretability and address label sparsity in cross-modal video representation learning, improving ad-hoc video search performance.

Contribution

It proposes a novel encoder-decoder network with interpretable training objectives for cross-modal video representation learning, addressing dataset bias and label sparsity issues.

Findings

01

Outperforms state-of-the-art retrieval models on TRECVid and MSR-VTT datasets.

02

Demonstrates improved interpretability of embeddings.

03

Achieves statistically significant performance gains.

Abstract

Cross-modal representation learning has become a new normal for bridging the semantic gap between text and visual data. Learning modality agnostic representations in a continuous latent space, however, is often treated as a black-box data-driven training process. It is well-known that the effectiveness of representation learning depends heavily on the quality and scale of training data. For video representation learning, having a complete set of labels that annotate the full spectrum of video content for training is highly difficult if not impossible. These issues, black-box training and dataset bias, make representation learning practically challenging to be deployed for video understanding due to unexplainable and unpredictable results. In this paper, we propose two novel training objectives, likelihood and unlikelihood functions, to unroll semantics behind embeddings while addressing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nikkiwoo-gh/ITV
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research