Learning Trajectory-Aware Transformer for Video Super-Resolution

Chengxu Liu; Huan Yang; Jianlong Fu; Xueming Qian

arXiv:2204.04216·eess.IV·April 21, 2022·6 cites

Learning Trajectory-Aware Transformer for Video Super-Resolution

Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

PDF

Open Access 1 Repo

TL;DR

This paper introduces TTVSR, a trajectory-aware Transformer model for video super-resolution that effectively captures long-range spatio-temporal dependencies while reducing computational costs, outperforming existing methods.

Contribution

The paper proposes a novel trajectory-aware Transformer architecture with a cross-scale tokenization module for improved long-range video super-resolution.

Findings

01

TTVSR outperforms state-of-the-art models on four benchmarks.

02

The trajectory-based attention reduces computational costs.

03

The cross-scale tokenization handles scale variations effectively.

Abstract

Video super-resolution (VSR) aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts. Although some progress has been made, there are grand challenges to effectively utilize temporal dependency in entire video sequences. Existing approaches usually align and aggregate video frames from limited adjacent frames (e.g., 5 or 7 frames), which prevents these approaches from satisfactory results. In this paper, we take one step further to enable effective spatio-temporal learning in videos. We propose a novel Trajectory-aware Transformer for Video Super-Resolution (TTVSR). In particular, we formulate video frames into several pre-aligned trajectories which consist of continuous visual tokens. For a query token, self-attention is only learned on relevant visual tokens along spatio-temporal trajectories. Compared with vanilla vision Transformers,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

researchmm/TTVSR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Image Processing Techniques and Applications

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention · Layer Normalization · Absolute Position Encodings · Softmax · Residual Connection