TranStable: Towards Robust Pixel-level Online Video Stabilization by   Jointing Transformer and CNN

zhizhen li; tianyi zhuo; Yifei Cao; Jizhe Yu; Yu Liu

arXiv:2501.15138·cs.CV·January 28, 2025

TranStable: Towards Robust Pixel-level Online Video Stabilization by Jointing Transformer and CNN

zhizhen li, tianyi zhuo, Yifei Cao, Jizhe Yu, Yu Liu

PDF

Open Access

TL;DR

TranStable introduces a novel end-to-end video stabilization framework combining Transformer and CNN to produce pixel-level warping maps, reducing distortion and cropping while maintaining visual fidelity.

Contribution

The paper presents TranStable, a new framework integrating Transformer and CNN with a hierarchical fusion module and a stability discriminator for improved pixel-level stabilization.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effectively reduces jitter artifacts and distortion.

03

Maintains a wider field of view during stabilization.

Abstract

Video stabilization often struggles with distortion and excessive cropping. This paper proposes a novel end-to-end framework, named TranStable, to address these challenges, comprising a genera tor and a discriminator. We establish TransformerUNet (TUNet) as the generator to utilize the Hierarchical Adaptive Fusion Module (HAFM), integrating Transformer and CNN to leverage both global and local features across multiple visual cues. By modeling frame-wise relationships, it generates robust pixel-level warping maps for stable geometric transformations. Furthermore, we design the Stability Discriminator Module (SDM), which provides pixel-wise supervision for authenticity and consistency in training period, ensuring more complete field-of-view while minimizing jitter artifacts and enhancing visual fidelity. Extensive experiments on NUS, DeepStab, and Selfie benchmarks demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Stabilization · Advanced Steganography and Watermarking Techniques · Advanced Optical Imaging Technologies

MethodsSoftmax · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing