Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large   Vision-Language Models

Dong Li; Jiandong Jin; Yuhao Zhang; Yanlin Zhong; Yaoyang Wu; Lan; Chen; Xiao Wang; Bin Luo

arXiv:2311.18592·cs.CV·December 1, 2023·1 cites

Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models

Dong Li, Jiandong Jin, Yuhao Zhang, Yanlin Zhong, Yaoyang Wu, Lan, Chen, Xiao Wang, Bin Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces SAFE, a novel pattern recognition framework that leverages large-scale vision-language models to fuse RGB frames, event streams, and semantic labels, improving accuracy in event-based pattern recognition tasks.

Contribution

The study proposes a new multimodal fusion framework using pre-trained CLIP models and Transformer networks to effectively integrate visual and semantic information for pattern recognition.

Findings

01

Outperforms existing methods on HARDVS and PokerEvent datasets.

02

Effectively integrates semantic labels with visual data using large-scale models.

03

Demonstrates significant improvement in recognition accuracy.

Abstract

Pattern recognition through the fusion of RGB frames and Event streams has emerged as a novel research area in recent years. Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition. However, we posit that these methods may suffer from key issues like sematic gaps and small-scale backbone networks. In this study, we introduce a novel pattern recognition framework that consolidates the semantic labels, RGB frames, and event streams, leveraging pre-trained large-scale vision-language models. Specifically, given the input RGB frames, event streams, and all the predefined semantic labels, we employ a pre-trained large-scale vision model (CLIP vision encoder) to extract the RGB and event features. To handle the semantic labels, we initially convert them into language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

event-ahu/safe_largevlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCurrency Recognition and Detection · Brain Tumor Detection and Classification · Anomaly Detection Techniques and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings