Modulo Video Recovery via Selective Spatiotemporal Vision Transformer

Tianyu Geng; Feng Ji; Wee Peng Tay

arXiv:2511.07479·cs.CV·November 12, 2025

Modulo Video Recovery via Selective Spatiotemporal Vision Transformer

Tianyu Geng, Feng Ji, Wee Peng Tay

PDF

Open Access

TL;DR

This paper introduces SSViT, a novel deep learning framework using a selective spatiotemporal transformer for high-quality modulo video reconstruction, outperforming previous methods in efficiency and accuracy.

Contribution

We develop the first deep learning-based modulo video recovery method employing a selective spatiotemporal transformer architecture.

Findings

01

SSViT achieves state-of-the-art performance in modulo video reconstruction.

02

The token selection strategy improves computational efficiency and focus.

03

High-quality reconstructions are possible from 8-bit folded videos.

Abstract

Conventional image sensors have limited dynamic range, causing saturation in high-dynamic-range (HDR) scenes. Modulo cameras address this by folding incident irradiance into a bounded range, yet require specialized unwrapping algorithms to reconstruct the underlying signal. Unlike HDR recovery, which extends dynamic range from conventional sampling, modulo recovery restores actual values from folded samples. Despite being introduced over a decade ago, progress in modulo image recovery has been slow, especially in the use of modern deep learning techniques. In this work, we demonstrate that standard HDR methods are unsuitable for modulo recovery. Transformers, however, can capture global dependencies and spatial-temporal relationships crucial for resolving folded video frames. Still, adapting existing Transformer architectures for modulo recovery demands novel techniques. To this end, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · CCD and CMOS Imaging Sensors · Advanced Image Processing Techniques