SpikeGen: Decoupled "Rods and Cones" Visual Representation Processing with Latent Generative Framework

Gaole Dai; Menghang Dong; Rongyu Zhang; Ruichuan An; Shanghang Zhang; Tiejun Huang

arXiv:2505.18049·cs.CV·October 2, 2025

SpikeGen: Decoupled "Rods and Cones" Visual Representation Processing with Latent Generative Framework

Gaole Dai, Menghang Dong, Rongyu Zhang, Ruichuan An, Shanghang Zhang, Tiejun Huang

PDF

TL;DR

SpikeGen emulates human visual processing by integrating decoupled motion and color inputs using a latent generative framework, improving multi-modal visual tasks like deblurring and scene synthesis.

Contribution

It introduces a novel framework that combines decoupled visual modalities with latent-space generative models for enhanced multi-modal visual processing.

Findings

01

Effective in spike-RGB tasks such as image/video deblurring

02

Improves dense frame reconstruction from spike streams

03

Enhances high-speed scene view synthesis

Abstract

The process through which humans perceive and learn visual representations in dynamic environments is highly complex. From a structural perspective, the human eye decouples the functions of cone and rod cells: cones are primarily responsible for color perception, while rods are specialized in detecting motion, particularly variations in light intensity. These two distinct modalities of visual information are integrated and processed within the visual cortex, thereby enhancing the robustness of the human visual system. Inspired by this biological mechanism, modern hardware systems have evolved to include not only color-sensitive RGB cameras but also motion-sensitive Dynamic Visual Systems, such as spike cameras. Building upon these advancements, this study seeks to emulate the human visual system by integrating decomposed multi-modal visual inputs with modern latent-space generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need