Revealing Latent Information: A Physics-inspired Self-supervised Pre-training Framework for Noisy and Sparse Events
Lin Zhu, Ruonan Liu, Xiao Wang, Lizhi Wang, Hua Huang

TL;DR
This paper introduces a physics-inspired self-supervised pre-training framework for event camera data, effectively revealing latent information like edges and textures, and improving performance on multiple vision tasks despite noise and sparsity.
Contribution
It proposes a novel three-stage pre-training framework that enhances feature extraction from noisy, sparse event data, outperforming existing methods across various tasks.
Findings
Outperforms state-of-the-art methods on object recognition
Improves semantic segmentation accuracy
Enhances optical flow estimation robustness
Abstract
Event camera, a novel neuromorphic vision sensor, records data with high temporal resolution and wide dynamic range, offering new possibilities for accurate visual representation in challenging scenarios. However, event data is inherently sparse and noisy, mainly reflecting brightness changes, which complicates effective feature extraction. To address this, we propose a self-supervised pre-training framework to fully reveal latent information in event data, including edge information and texture cues. Our framework consists of three stages: Difference-guided Masked Modeling, inspired by the event physical sampling process, reconstructs temporal intensity difference maps to extract enhanced information from raw event data. Backbone-fixed Feature Transition contrasts event and image features without updating the backbone to preserve representations learned from masked modeling and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
