Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition
Zongyou Yu, Qiang Qu, Xiaoming Chen, and Chen Wang

TL;DR
This paper investigates the ability of large language models like GPT-4 to perform event-based visual recognition in a zero-shot setting, showing they can significantly outperform existing methods without additional training.
Contribution
First study to evaluate LLMs for event-based recognition, demonstrating their effectiveness and surpassing state-of-the-art zero-shot methods using prompt engineering.
Findings
GPT-4o outperforms other models on benchmark datasets.
LLMs achieve significant accuracy improvements with well-designed prompts.
Zero-shot recognition performance exceeds previous methods by large margins.
Abstract
Recent advancements in event-based zero-shot object recognition have demonstrated promising results. However, these methods heavily depend on extensive training and are inherently constrained by the characteristics of CLIP. To the best of our knowledge, this research is the first study to explore the understanding capabilities of large language models (LLMs) for event-based visual content. We demonstrate that LLMs can achieve event-based object recognition without additional training or fine-tuning in conjunction with CLIP, effectively enabling pure zero-shot event-based recognition. Particularly, we evaluate the ability of GPT-4o / 4turbo and two other open-source LLMs to directly recognize event-based visual content. Extensive experiments are conducted across three benchmark datasets, systematically assessing the recognition accuracy of these models. The results show that LLMs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsContrastive Language-Image Pre-training
