Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot   Event-based Recognition

Zongyou Yu; Qiang Qu; Xiaoming Chen; and Chen Wang

arXiv:2409.09628·cs.CV·December 12, 2024

Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition

Zongyou Yu, Qiang Qu, Xiaoming Chen, and Chen Wang

PDF

Open Access 1 Repo

TL;DR

This paper investigates the ability of large language models like GPT-4 to perform event-based visual recognition in a zero-shot setting, showing they can significantly outperform existing methods without additional training.

Contribution

First study to evaluate LLMs for event-based recognition, demonstrating their effectiveness and surpassing state-of-the-art zero-shot methods using prompt engineering.

Findings

01

GPT-4o outperforms other models on benchmark datasets.

02

LLMs achieve significant accuracy improvements with well-designed prompts.

03

Zero-shot recognition performance exceeds previous methods by large margins.

Abstract

Recent advancements in event-based zero-shot object recognition have demonstrated promising results. However, these methods heavily depend on extensive training and are inherently constrained by the characteristics of CLIP. To the best of our knowledge, this research is the first study to explore the understanding capabilities of large language models (LLMs) for event-based visual content. We demonstrate that LLMs can achieve event-based object recognition without additional training or fine-tuning in conjunction with CLIP, effectively enabling pure zero-shot event-based recognition. Particularly, we evaluate the ability of GPT-4o / 4turbo and two other open-source LLMs to directly recognize event-based visual content. Extensive experiments are conducted across three benchmark datasets, systematically assessing the recognition accuracy of these models. The results show that LLMs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrisyu-zz/pure-event-based-recognition-based-llm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsContrastive Language-Image Pre-training