Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Xiao Wang, Yao Rong, Shiao Wang, Yuan Chen, Zhe Wu, Bo Jiang, Yonghong, Tian, Jin Tang

TL;DR
This paper introduces TSCFormer, a lightweight CNN-Transformer framework that effectively combines local and global features for RGB-Event video recognition, achieving a good balance between accuracy and model complexity.
Contribution
The paper proposes a novel CNN-Transformer model, TSCFormer, that fuses RGB and Event data using global tokens and interactive modules, improving recognition performance while maintaining simplicity.
Findings
Validated on large-scale RGB-Event datasets PokerEvent and HARDVS
Achieved superior accuracy with fewer parameters compared to existing methods
Demonstrated effective global-local feature fusion for video recognition
Abstract
Pattern recognition based on RGB-Event data is a newly arising research topic and previous works usually learn their features using CNN or Transformer. As we know, CNN captures the local features well and the cascaded self-attention mechanisms are good at extracting the long-range global relations. It is intuitive to combine them for high-performance RGB-Event based video recognition, however, existing works fail to achieve a good balance between the accuracy and model parameters, as shown in Fig.~\ref{firstimage}. In this work, we propose a novel RGB-Event based recognition framework termed TSCFormer, which is a relatively lightweight CNN-Transformer model. Specifically, we mainly adopt the CNN as the backbone network to first encode both RGB and Event data. Meanwhile, we initialize global tokens as the input and fuse them with RGB and Event features using the BridgeFormer module. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Advanced Memory and Neural Computing · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Adam · Layer Normalization · Residual Connection
