Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video   Captioning

Xin Wang; Jiawei Wu; Da Zhang; Yu Su; William Yang Wang

arXiv:1811.02765·cs.CL·November 27, 2018·5 cites

Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning

Xin Wang, Jiawei Wu, Da Zhang, Yu Su, William Yang Wang

PDF

Open Access

TL;DR

This paper introduces a novel zero-shot video captioning task and proposes a Topic-Aware Mixture of Experts model that leverages external semantic knowledge to generalize to unseen activities.

Contribution

The paper presents a new zero-shot video captioning framework using topic-aware experts and external semantic embeddings, enabling generalization to unseen activities.

Findings

01

Effective in describing unseen activities

02

Utilizes external topic-related text corpus

03

Shows strong generalization ability

Abstract

Although promising results have been achieved in video captioning, existing models are limited to the fixed inventory of activities in the training corpus, and do not generalize to open vocabulary scenarios. Here we introduce a novel task, zero-shot video captioning, that aims at describing out-of-domain videos of unseen activities. Videos of different activities usually require different captioning strategies in many aspects, i.e. word selection, semantic construction, and style expression etc, which poses a great challenge to depict novel activities without paired training data. But meanwhile, similar activities share some of those aspects in common. Therefore, We propose a principled Topic-Aware Mixture of Experts (TAMoE) model for zero-shot video captioning, which learns to compose different experts based on different topic embeddings, implicitly transferring the knowledge learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization