Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models
Dong Li, Jiandong Jin, Yuhao Zhang, Yanlin Zhong, Yaoyang Wu, Lan, Chen, Xiao Wang, Bin Luo

TL;DR
This paper introduces SAFE, a novel pattern recognition framework that leverages large-scale vision-language models to fuse RGB frames, event streams, and semantic labels, improving accuracy in event-based pattern recognition tasks.
Contribution
The study proposes a new multimodal fusion framework using pre-trained CLIP models and Transformer networks to effectively integrate visual and semantic information for pattern recognition.
Findings
Outperforms existing methods on HARDVS and PokerEvent datasets.
Effectively integrates semantic labels with visual data using large-scale models.
Demonstrates significant improvement in recognition accuracy.
Abstract
Pattern recognition through the fusion of RGB frames and Event streams has emerged as a novel research area in recent years. Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition. However, we posit that these methods may suffer from key issues like sematic gaps and small-scale backbone networks. In this study, we introduce a novel pattern recognition framework that consolidates the semantic labels, RGB frames, and event streams, leveraging pre-trained large-scale vision-language models. Specifically, given the input RGB frames, event streams, and all the predefined semantic labels, we employ a pre-trained large-scale vision model (CLIP vision encoder) to extract the RGB and event features. To handle the semantic labels, we initially convert them into language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCurrency Recognition and Detection · Brain Tumor Detection and Classification · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings
