Convolutions Need Registers Too: HVS-Inspired Dynamic Attention for Video Quality Assessment
Mayesha Maliha R. Mithila, Mylene C.Q. Farias

TL;DR
This paper introduces DAGR-VQA, a novel convolutional framework with global register tokens for dynamic, HVS-inspired attention in no-reference video quality assessment, achieving state-of-the-art performance and real-time efficiency.
Contribution
It presents the first integration of register tokens into a convolutional backbone for dynamic saliency prediction in VQA, enabling temporally adaptive attention without motion estimation.
Findings
Outperforms most top baselines on multiple datasets.
Achieves 387.7 FPS at 1080p for real-time use.
Ablation studies confirm the effectiveness of register tokens.
Abstract
No-reference video quality assessment (NR-VQA) estimates perceptual quality without a reference video, which is often challenging. While recent techniques leverage saliency or transformer attention, they merely address global context of the video signal by using static maps as auxiliary inputs rather than embedding context fundamentally within feature extraction of the video sequence. We present Dynamic Attention with Global Registers for Video Quality Assessment (DAGR-VQA), the first framework integrating register-token directly into a convolutional backbone for spatio-temporal, dynamic saliency prediction. By embedding learnable register tokens as global context carriers, our model enables dynamic, HVS-inspired attention, producing temporally adaptive saliency maps that track salient regions over time without explicit motion estimation. Our model integrates dynamic saliency maps with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Video Coding and Compression Technologies
