Convolutions Need Registers Too: HVS-Inspired Dynamic Attention for Video Quality Assessment

Mayesha Maliha R. Mithila; Mylene C.Q. Farias

arXiv:2601.11045·eess.IV·January 19, 2026

Convolutions Need Registers Too: HVS-Inspired Dynamic Attention for Video Quality Assessment

Mayesha Maliha R. Mithila, Mylene C.Q. Farias

PDF

Open Access

TL;DR

This paper introduces DAGR-VQA, a novel convolutional framework with global register tokens for dynamic, HVS-inspired attention in no-reference video quality assessment, achieving state-of-the-art performance and real-time efficiency.

Contribution

It presents the first integration of register tokens into a convolutional backbone for dynamic saliency prediction in VQA, enabling temporally adaptive attention without motion estimation.

Findings

01

Outperforms most top baselines on multiple datasets.

02

Achieves 387.7 FPS at 1080p for real-time use.

03

Ablation studies confirm the effectiveness of register tokens.

Abstract

No-reference video quality assessment (NR-VQA) estimates perceptual quality without a reference video, which is often challenging. While recent techniques leverage saliency or transformer attention, they merely address global context of the video signal by using static maps as auxiliary inputs rather than embedding context fundamentally within feature extraction of the video sequence. We present Dynamic Attention with Global Registers for Video Quality Assessment (DAGR-VQA), the first framework integrating register-token directly into a convolutional backbone for spatio-temporal, dynamic saliency prediction. By embedding learnable register tokens as global context carriers, our model enables dynamic, HVS-inspired attention, producing temporally adaptive saliency maps that track salient regions over time without explicit motion estimation. Our model integrates dynamic saliency maps with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Video Coding and Compression Technologies