Hierarchical Separable Video Transformer for Snapshot Compressive   Imaging

Ping Wang; Yulun Zhang; Lishun Wang; and Xin Yuan

arXiv:2407.11946·cs.CV·July 18, 2024

Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Ping Wang, Yulun Zhang, Lishun Wang, and Xin Yuan

PDF

Open Access 1 Repo

TL;DR

This paper introduces HiSViT, a hierarchical separable video transformer that improves snapshot compressive imaging reconstruction by focusing on multi-scale spatial interactions and reducing computational costs, outperforming previous methods.

Contribution

The paper proposes a novel hierarchical separable transformer architecture with CSS-MSA and GSM-FFN for efficient, multi-scale video reconstruction in SCI, addressing previous limitations in degradation insight.

Findings

01

Outperforms previous methods by >0.5 dB in PSNR.

02

Uses fewer parameters and lower complexity.

03

Effective multi-scale spatial-temporal modeling.

Abstract

Transformers have achieved the state-of-the-art performance on solving the inverse problem of Snapshot Compressive Imaging (SCI) for video, whose ill-posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstruction architecture without temporal aggregation in early layers and Hierarchical Separable Video Transformer (HiSViT) as building block. HiSViT is built by multiple groups of Cross-Scale Separable Multi-head Self-Attention (CSS-MSA) and Gated Self-Modulated Feed-Forward Network (GSM-FFN) with dense connections, each of which is conducted within a separate channel portions at a different scale, for multi-scale interactions and long-range modeling. By separating spatial operations from temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pwangcs/hisvit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MRI Techniques and Applications · Sparse and Compressive Sensing Techniques · Photoacoustic and Ultrasonic Imaging

MethodsAttention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Adam · Dropout · Multi-Head Attention · Dense Connections