Reconstruction as a Bridge for Event-Based Visual Question Answering
Hanyue Lou, Jiayi Zhou, Yang Zhang, Boyu Li, Yi Wang, Guangnan Ye, Boxin Shi

TL;DR
This paper introduces reconstruction-based methods to integrate event cameras with multimodal large language models for improved event-based visual question answering, supported by a new real-world benchmark.
Contribution
It proposes the FRT and ART methods for bridging event data with frame-based models and introduces EvQA, the first real-world benchmark for event-based MLLMs.
Findings
Achieved state-of-the-art performance on EvQA benchmark.
Demonstrated the effectiveness of reconstruction methods in event-based VQA.
Validated potential of MLLMs in event-based vision tasks.
Abstract
Integrating event cameras with Multimodal Large Language Models (MLLMs) promises general scene understanding in challenging visual conditions, yet requires navigating a trade-off between preserving the unique advantages of event data and ensuring compatibility with frame-based models. We address this challenge by using reconstruction as a bridge, proposing a straightforward Frame-based Reconstruction and Tokenization (FRT) method and designing an efficient Adaptive Reconstruction and Tokenization (ART) method that leverages event sparsity. For robust evaluation, we introduce EvQA, the first objective, real-world benchmark for event-based MLLMs, comprising 1,000 event-Q&A pairs from 22 public datasets. Our experiments demonstrate that our methods achieve state-of-the-art performance on EvQA, highlighting the significant potential of MLLMs in event-based vision.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Memory and Neural Computing · Advanced Neural Network Applications
