Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
Ping Wang, Yulun Zhang, Lishun Wang, and Xin Yuan

TL;DR
This paper introduces HiSViT, a hierarchical separable video transformer that improves snapshot compressive imaging reconstruction by focusing on multi-scale spatial interactions and reducing computational costs, outperforming previous methods.
Contribution
The paper proposes a novel hierarchical separable transformer architecture with CSS-MSA and GSM-FFN for efficient, multi-scale video reconstruction in SCI, addressing previous limitations in degradation insight.
Findings
Outperforms previous methods by >0.5 dB in PSNR.
Uses fewer parameters and lower complexity.
Effective multi-scale spatial-temporal modeling.
Abstract
Transformers have achieved the state-of-the-art performance on solving the inverse problem of Snapshot Compressive Imaging (SCI) for video, whose ill-posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstruction architecture without temporal aggregation in early layers and Hierarchical Separable Video Transformer (HiSViT) as building block. HiSViT is built by multiple groups of Cross-Scale Separable Multi-head Self-Attention (CSS-MSA) and Gated Self-Modulated Feed-Forward Network (GSM-FFN) with dense connections, each of which is conducted within a separate channel portions at a different scale, for multi-scale interactions and long-range modeling. By separating spatial operations from temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced MRI Techniques and Applications · Sparse and Compressive Sensing Techniques · Photoacoustic and Ultrasonic Imaging
MethodsAttention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Adam · Dropout · Multi-Head Attention · Dense Connections
