Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification
Chengguo Yuan, Yu Jin, Zongzhen Wu, Fanting Wei, Yangzirui Wang, Lan, Chen, and Xiao Wang

TL;DR
This paper introduces a dual-stream Transformer and GNN-based framework for event image and voxel data fusion, significantly improving event-based object classification accuracy.
Contribution
It proposes a novel dual-stream architecture with a bottleneck Transformer for effective fusion of event images and voxels, enhancing feature representation.
Findings
Achieves state-of-the-art results on event classification datasets
Effectively models spatial and 3D stereo information separately
Demonstrates superior performance over existing methods
Abstract
Recognizing target objects using an event-based camera draws more and more attention in recent years. Existing works usually represent the event streams into point-cloud, voxel, image, etc, and learn the feature representations using various deep neural networks. Their final results may be limited by the following factors: monotonous modal expressions and the design of the network structure. To address the aforementioned challenges, this paper proposes a novel dual-stream framework for event representation, extraction, and fusion. This framework simultaneously models two common representations: event images and event voxels. By utilizing Transformer and Structured Graph Neural Network (GNN) architectures, spatial information and three-dimensional stereo information can be learned separately. Additionally, a bottleneck Transformer is introduced to facilitate the fusion of the dual-stream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced X-ray and CT Imaging · Medical Imaging Techniques and Applications · Radiation Detection and Scintillator Technologies
MethodsMulti-Head Attention · Attention Is All You Need · Graph Neural Network · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · 1x1 Convolution · Layer Normalization · Pointwise Convolution
