Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis, Nikolaos Gkalelis, Vasileios Mezaris

TL;DR
This paper presents Masked Feature Modelling (MFM), an unsupervised pre-training method for Graph Attention Networks that enhances bottom-up video event recognition by leveraging feature masking and a pretrained Visual Tokenizer.
Contribution
The paper introduces MFM, a novel unsupervised pre-training approach for GAT blocks that improves video event recognition accuracy when integrated into existing architectures.
Findings
MFM improves event recognition performance on YLI-MED dataset.
Pre-trained GAT blocks enhance the overall accuracy of the ViGAT architecture.
Experimental results validate the effectiveness of feature masking in unsupervised learning.
Abstract
In this paper, we introduce Masked Feature Modelling (MFM), a novel approach for the unsupervised pre-training of a Graph Attention Network (GAT) block. MFM utilizes a pretrained Visual Tokenizer to reconstruct masked features of objects within a video, leveraging the MiniKinetics dataset. We then incorporate the pre-trained GAT block into a state-of-the-art bottom-up supervised video-event recognition architecture, ViGAT, to improve the model's starting point and overall accuracy. Experimental evaluations on the YLI-MED dataset demonstrate the effectiveness of MFM in improving event recognition performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
MethodsGraph Attention Network
