EZSR: Event-based Zero-Shot Recognition
Yan Yang, Liyuan Pan, Dongxu Li, Liu Liu

TL;DR
This paper introduces EZSR, a novel event-based zero-shot object recognition method that leverages a new event encoder and data synthesis techniques to outperform existing approaches on benchmark datasets.
Contribution
The study develops an event encoder without reconstruction networks, analyzes performance bottlenecks, and proposes a scalar-wise modulation strategy for improved zero-shot recognition.
Findings
Achieves 47.84% zero-shot accuracy on N-ImageNet with ViT/B-16.
Demonstrates superior performance over previous methods on standard benchmarks.
Shows effective scaling with increased parameters and synthesized data.
Abstract
This paper studies zero-shot object recognition using event camera data. Guided by CLIP, which is pre-trained on RGB images, existing approaches achieve zero-shot object recognition by optimizing embedding similarities between event data and RGB images respectively encoded by an event encoder and the CLIP image encoder. Alternatively, several methods learn RGB frame reconstructions from event data for the CLIP image encoder. However, they often result in suboptimal zero-shot performance. This study develops an event encoder without relying on additional reconstruction networks. We theoretically analyze the performance bottlenecks of previous approaches: the embedding optimization objectives are prone to suffer from the spatial sparsity of event data, causing semantic misalignments between the learned event embedding space and the CLIP text embedding space. To mitigate the issue, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Radiation Detection and Scintillator Technologies · Nuclear Physics and Applications
MethodsContrastive Language-Image Pre-training
