EventGPT: Event Stream Understanding with Multimodal Large Language Models
Shaoyu Liu, Jianing Li, Guanghui Zhao, Yunjian Zhang, Xin Meng, Fei, Richard Yu, Xiangyang Ji, Ming Li

TL;DR
EventGPT is a pioneering multimodal large language model designed specifically for understanding event streams from event cameras, addressing domain gaps through a three-stage training process and achieving superior performance in scene understanding tasks.
Contribution
This work introduces the first MLLM for event stream understanding, developing a novel three-stage optimization paradigm to bridge domain gaps and enhance event-based scene comprehension.
Findings
EventGPT outperforms previous models in generation quality.
It achieves higher descriptive accuracy in event scene understanding.
The model demonstrates improved reasoning capabilities on benchmarks.
Abstract
Event cameras record visual information as asynchronous pixel change streams, excelling at scene perception under unsatisfactory lighting or high-dynamic conditions. Existing multimodal large language models (MLLMs) concentrate on natural RGB images, failing in scenarios where event data fits better. In this paper, we introduce EventGPT, the first MLLM for event stream understanding, to the best of our knowledge, marking a pioneering attempt to integrate large language models (LLMs) with event stream comprehension. To mitigate the huge domain gaps, we develop a three-stage optimization paradigm to gradually equip a pre-trained LLM with the capability of understanding event-based scenes. Our EventGPT comprises an event encoder, followed by a spatio-temporal aggregator, a linear projector, an event-language adapter, and an LLM. Firstly, RGB image-text pairs generated by GPT are leveraged…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Semantic Web and Ontologies · Data Quality and Management
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Dropout · Linear Warmup With Cosine Annealing · Dense Connections · Layer Normalization · Linear Layer · Discriminative Fine-Tuning · Weight Decay
