Masked Feature Modelling: Feature Masking for the Unsupervised   Pre-training of a Graph Attention Network Block for Bottom-up Video Event   Recognition

Dimitrios Daskalakis; Nikolaos Gkalelis; Vasileios Mezaris

arXiv:2308.12673·cs.CV·August 28, 2023

Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition

Dimitrios Daskalakis, Nikolaos Gkalelis, Vasileios Mezaris

PDF

Open Access

TL;DR

This paper presents Masked Feature Modelling (MFM), an unsupervised pre-training method for Graph Attention Networks that enhances bottom-up video event recognition by leveraging feature masking and a pretrained Visual Tokenizer.

Contribution

The paper introduces MFM, a novel unsupervised pre-training approach for GAT blocks that improves video event recognition accuracy when integrated into existing architectures.

Findings

01

MFM improves event recognition performance on YLI-MED dataset.

02

Pre-trained GAT blocks enhance the overall accuracy of the ViGAT architecture.

03

Experimental results validate the effectiveness of feature masking in unsupervised learning.

Abstract

In this paper, we introduce Masked Feature Modelling (MFM), a novel approach for the unsupervised pre-training of a Graph Attention Network (GAT) block. MFM utilizes a pretrained Visual Tokenizer to reconstruct masked features of objects within a video, leveraging the MiniKinetics dataset. We then incorporate the pre-trained GAT block into a state-of-the-art bottom-up supervised video-event recognition architecture, ViGAT, to improve the model's starting point and overall accuracy. Experimental evaluations on the YLI-MED dataset demonstrate the effectiveness of MFM in improving event recognition performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Graph Neural Networks · Multimodal Machine Learning Applications

MethodsGraph Attention Network