CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based   Understanding

Wenhao Xu; Wenming Weng; Yueyi Zhang; Zhiwei Xiong

arXiv:2407.06611·cs.CV·July 10, 2024

CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding

Wenhao Xu, Wenming Weng, Yueyi Zhang, Zhiwei Xiong

PDF

Open Access

TL;DR

CEIA introduces a contrastive learning framework that aligns event and image data via CLIP to enhance open-world event understanding, overcoming the scarcity of paired event-text data and improving performance across multiple applications.

Contribution

CEIA is the first to leverage event-image datasets to align event and text data through image-based contrastive learning, enabling scalable and versatile event understanding.

Findings

01

CEIA achieves state-of-the-art zero-shot performance in event recognition.

02

It effectively improves event-image and event-text retrieval tasks.

03

The framework demonstrates strong domain adaptation capabilities.

Abstract

We present CEIA, an effective framework for open-world event-based understanding. Currently training a large event-text model still poses a huge challenge due to the shortage of paired event-text data. In response to this challenge, CEIA learns to align event and image data as an alternative instead of directly aligning event and text data. Specifically, we leverage the rich event-image datasets to learn an event embedding space aligned with the image space of CLIP through contrastive learning. In this way, event and text data are naturally aligned via using image data as a bridge. Particularly, CEIA offers two distinct advantages. First, it allows us to take full advantage of the existing event-image datasets to make up the shortage of large-scale event-text datasets. Second, leveraging more training data, it also exhibits the flexibility to boost performance, ensuring scalable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Biomedical Text Mining and Ontologies · Semantic Web and Ontologies

MethodsContrastive Language-Image Pre-training · ALIGN