Exploiting Temporal Coherence for Multi-modal Video Categorization

Palash Goyal; Saurabh Sahu; Shalini Ghosh; Chul Lee

arXiv:2002.03844·cs.CV·June 9, 2020·1 cites

Exploiting Temporal Coherence for Multi-modal Video Categorization

Palash Goyal, Saurabh Sahu, Shalini Ghosh, Chul Lee

PDF

Open Access

TL;DR

This paper introduces a novel temporal coherence-based regularization method for multimodal video categorization, improving model performance across various architectures like RNNs, NetVLAD, and Transformers.

Contribution

The paper proposes a new temporal coherence regularization technique applicable to multiple model types for enhanced multimodal video categorization.

Findings

01

Outperforms state-of-the-art baseline models

02

Effective across different model architectures

03

Improves accuracy in video content analysis

Abstract

Multimodal ML models can process data in multiple modalities (e.g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding). In this paper, we focus on the problem of video categorization by using a multimodal approach. We have developed a novel temporal coherence-based regularization approach, which applies to different types of models (e.g., RNN, NetVLAD, Transformer). We demonstrate through experiments how our proposed multimodal video categorization models with temporal coherence out-perform strong state-of-the-art baseline models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning