Learning Trajectory-Aware Transformer for Video Super-Resolution
Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

TL;DR
This paper introduces TTVSR, a trajectory-aware Transformer model for video super-resolution that effectively captures long-range spatio-temporal dependencies while reducing computational costs, outperforming existing methods.
Contribution
The paper proposes a novel trajectory-aware Transformer architecture with a cross-scale tokenization module for improved long-range video super-resolution.
Findings
TTVSR outperforms state-of-the-art models on four benchmarks.
The trajectory-based attention reduces computational costs.
The cross-scale tokenization handles scale variations effectively.
Abstract
Video super-resolution (VSR) aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts. Although some progress has been made, there are grand challenges to effectively utilize temporal dependency in entire video sequences. Existing approaches usually align and aggregate video frames from limited adjacent frames (e.g., 5 or 7 frames), which prevents these approaches from satisfactory results. In this paper, we take one step further to enable effective spatio-temporal learning in videos. We propose a novel Trajectory-aware Transformer for Video Super-Resolution (TTVSR). In particular, we formulate video frames into several pre-aligned trajectories which consist of continuous visual tokens. For a query token, self-attention is only learned on relevant visual tokens along spatio-temporal trajectories. Compared with vanilla vision Transformers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Image Processing Techniques and Applications
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention · Layer Normalization · Absolute Position Encodings · Softmax · Residual Connection
