Contrastive learning-based video quality assessment-jointed video vision transformer for video recognition

Jian Sun; Mohammad H. Mahoor

arXiv:2603.10965·cs.CV·March 12, 2026

Contrastive learning-based video quality assessment-jointed video vision transformer for video recognition

Jian Sun, Mohammad H. Mahoor

PDF

Open Access

TL;DR

This paper introduces SSL-V3, a novel self-supervised learning framework combining Video Quality Assessment with video classification using a Video Vision Transformer, improving accuracy especially in healthcare video datasets.

Contribution

The paper proposes a joint learning approach that integrates VQA into video classification, addressing label scarcity and enhancing robustness of video recognition models.

Findings

01

Achieved 94.87% accuracy on I-CONECT dataset

02

Effectively links VQA scores with classification features

03

Demonstrates robustness across two datasets

Abstract

Video quality significantly affects video classification. We found this problem when we classified Mild Cognitive Impairment well from clear videos, but worse from blurred ones. From then, we realized that referring to Video Quality Assessment (VQA) may improve video classification. This paper proposed Self-Supervised Learning-based Video Vision Transformer combined with No-reference VQA for video classification (SSL-V3) to fulfill the goal. SSL-V3 leverages Combined-SSL mechanism to join VQA into video classification and address the label shortage of VQA, which commonly occurs in video datasets, making it impossible to provide an accurate Video Quality Score. In brief, Combined-SSL takes video quality score as a factor to directly tune the feature map of the video classification. Then, the score, as an intersected point, links VQA and classification, using the supervised classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection