SpikeGen: Decoupled "Rods and Cones" Visual Representation Processing with Latent Generative Framework
Gaole Dai, Menghang Dong, Rongyu Zhang, Ruichuan An, Shanghang Zhang, Tiejun Huang

TL;DR
SpikeGen emulates human visual processing by integrating decoupled motion and color inputs using a latent generative framework, improving multi-modal visual tasks like deblurring and scene synthesis.
Contribution
It introduces a novel framework that combines decoupled visual modalities with latent-space generative models for enhanced multi-modal visual processing.
Findings
Effective in spike-RGB tasks such as image/video deblurring
Improves dense frame reconstruction from spike streams
Enhances high-speed scene view synthesis
Abstract
The process through which humans perceive and learn visual representations in dynamic environments is highly complex. From a structural perspective, the human eye decouples the functions of cone and rod cells: cones are primarily responsible for color perception, while rods are specialized in detecting motion, particularly variations in light intensity. These two distinct modalities of visual information are integrated and processed within the visual cortex, thereby enhancing the robustness of the human visual system. Inspired by this biological mechanism, modern hardware systems have evolved to include not only color-sensitive RGB cameras but also motion-sensitive Dynamic Visual Systems, such as spike cameras. Building upon these advancements, this study seeks to emulate the human visual system by integrating decomposed multi-modal visual inputs with modern latent-space generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
