Lightweight Quad Bayer HybridEVS Demosaicing via State Space Augmented Cross-Attention

Shiyang Zhou; Haijin Zeng; Yunfan Lu; Yongyong Chen; Jie Liu; Jingyong Su

arXiv:2508.06058·cs.CV·August 11, 2025

Lightweight Quad Bayer HybridEVS Demosaicing via State Space Augmented Cross-Attention

Shiyang Zhou, Haijin Zeng, Yunfan Lu, Yongyong Chen, Jie Liu, Jingyong Su

PDF

Open Access 4 Reviews

TL;DR

This paper introduces TSANet, a lightweight two-stage neural network with state space augmented cross-attention for efficient and high-quality demosaicing of HybridEVS event camera data, outperforming previous methods.

Contribution

The paper proposes a novel lightweight two-stage network with state space augmented cross-attention for event-based demosaicing, improving accuracy and efficiency on mobile devices.

Findings

01

Outperforms DemosaicFormer in PSNR and SSIM across seven datasets.

02

Reduces parameter count by 1.86 times and computation by 3.29 times.

03

Demonstrates effective demosaicing on both simulated and real HybridEVS data.

Abstract

Event cameras like the Hybrid Event-based Vision Sensor (HybridEVS) camera capture brightness changes as asynchronous "events" instead of frames, offering advanced application on mobile photography. However, challenges arise from combining a Quad Bayer Color Filter Array (CFA) sensor with event pixels lacking color information, resulting in aliasing and artifacts on the demosaicing process before downstream application. Current methods struggle to address these issues, especially on resource-limited mobile devices. In response, we introduce \textbf{TSANet}, a lightweight \textbf{T}wo-stage network via \textbf{S}tate space augmented cross-\textbf{A}ttention, which can handle event pixels inpainting and demosaicing separately, leveraging the benefits of dividing complex tasks into manageable subtasks. Furthermore, we introduce a lightweight Cross-Swin State Block that uniquely utilizes…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 3

Strengths

• The proposed RVSS module demonstrates effectiveness by significantly reducing model parameters while maintaining competitive performance levels. This efficiency can be particularly advantageous for resource-constrained environments, enabling the deployment of complex vision models on devices with limited computational power. The parameter reduction, achieved without notable sacrifices in accuracy or quality, highlights the RVSS module’s potential for scalability and its suitability for lightwe

Weaknesses

• The paper's relevance to event-based vision is unclear, as it lacks components specifically tailored for processing event signals. There is no dedicated mechanism or module designed to leverage the unique properties of event-based input. This raises questions about the paper’s contributions to event-based vision specifically. • The paper does not detail the loss function or the two-stage training strategy, both of which are crucial for understanding the network’s optimization and performance

Reviewer 02Rating 3Confidence 5

Strengths

S1. The research on lightweight hybrid event camera demosaicing architectures holds significant potential for advancing the field of event cameras. In the experiments, the proposed TSANet-s markedly outperforms the SOTA in terms of performance while maintaining the lowest parameter count and complexity. S2. Integrating SSM with window attention is an effective approach, as it substantially reduces model complexity while balancing both global and local information. S3. The authors incorporated

Weaknesses

W1: The combination of state-space models with attention does not appear enough novel. Additionally, the effects of the proposed QCSA and SPA as shown in the ablation study are minimal. W2: The paper lacks an in-depth discussion of integrating the Quad Bayer pattern's positional information. This aspect should be one of the primary focus. W3: On Page 4, Line 215, previous studies have shown that pretraining sub-networks can improve performance and inference stability, yet there is no citation

Reviewer 03Rating 3Confidence 4

Strengths

* The proposed two-stage network structure design seems to be effective in handling such kind of data. * The proposed method requires less number of parameters, which could be efficient in deploying on limited-resource mobile devices.

Weaknesses

* The motivation of proposing a two-stage network structure design is not that clear. As shown in Line200, the authors say that "all-in-one models often struggle to extract the inner connection between position and color", but do not provide any explanation or proof. They only let the readers to see the experimental results in Fig.6. It seems that it is more like story-telling (\eg, "in our experiments we found that doing xx could be better than doing yy") instead of giving in-depth analysis on

Reviewer 04Rating 5Confidence 3

Strengths

The paper explores a new and important research topic: demosaicing for Hybrid Event-based Vision Sensors (HybridEVS). This is a valuable area of study given the increasing interest in event vision (MIPI Demosaic 2024). The proposed TSANet introduces a lightweight network, which has the potential to be applied on mobile devices. However, the authors have not conducted experiments to validate its performance on edge computing (challenges iii).

Weaknesses

Please refer to Summary

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Image and Signal Denoising Methods · Industrial Vision Systems and Defect Detection