Auxiliary Learning for Self-Supervised Video Representation via   Similarity-based Knowledge Distillation

Amirhossein Dadashzadeh; Alan Whone; Majid Mirmehdi

arXiv:2112.04011·cs.CV·April 26, 2022

Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation

Amirhossein Dadashzadeh, Alan Whone, Majid Mirmehdi

PDF

Open Access 1 Repo

TL;DR

This paper introduces auxSKD, a knowledge distillation-based auxiliary pretraining method for self-supervised video representation learning, improving generalization on smaller datasets and across domain differences, with a new pretext task called VSPP.

Contribution

It proposes auxSKD, a novel auxiliary pretraining approach using similarity-based knowledge distillation, and introduces VSPP, a new pretext task for better video representations.

Findings

01

AuxSKD outperforms state-of-the-art on UCF101 and HMDB51 datasets.

02

Adding auxSKD improves existing self-supervised methods like VCOP, VideoPace, and RSPNet.

03

Our method enhances generalization on smaller and domain-shifted datasets.

Abstract

Despite the outstanding success of self-supervised pretraining methods for video representation learning, they generalise poorly when the unlabeled dataset for pretraining is small or the domain difference between unlabelled data in source task (pretraining) and labeled data in target task (finetuning) is significant. To mitigate these issues, we propose a novel approach to complement self-supervised pretraining via an auxiliary pretraining phase, based on knowledge similarity distillation, auxSKD, for better generalisation with a significantly smaller amount of video data, e.g. Kinetics-100 rather than Kinetics-400. Our method deploys a teacher network that iteratively distills its knowledge to the student model by capturing the similarity information between segments of unlabelled video data. The student model meanwhile solves a pretext task by exploiting this prior knowledge. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

plrbear/auxskd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings