ActivityCLIP: Enhancing Group Activity Recognition by Mining   Complementary Information from Text to Supplement Image Modality

Guoliang Xu; Jianqin Yin; Feng Zhou; Yonghao Dang

arXiv:2407.19820·cs.CV·July 30, 2024

ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality

Guoliang Xu, Jianqin Yin, Feng Zhou, Yonghao Dang

PDF

Open Access

TL;DR

ActivityCLIP introduces a novel approach that leverages action label text information to enhance group activity recognition by supplementing image data, improving performance with minimal additional parameters.

Contribution

The paper proposes a plug-and-play framework that integrates text-based semantic information from action labels into existing image-based methods for better group activity recognition.

Findings

01

Improves accuracy of group activity recognition methods.

02

Achieves performance gains with minimal additional trainable parameters.

03

Demonstrates effectiveness across multiple baseline methods.

Abstract

Previous methods usually only extract the image modality's information to recognize group activity. However, mining image information is approaching saturation, making it difficult to extract richer information. Therefore, extracting complementary information from other modalities to supplement image information has become increasingly important. In fact, action labels provide clear text information to express the action's semantics, which existing methods often overlook. Thus, we propose ActivityCLIP, a plug-and-play method for mining the text information contained in the action labels to supplement the image information for enhancing group activity recognition. ActivityCLIP consists of text and image branches, where the text branch is plugged into the image branch (The off-the-shelf image-based method). The text branch includes Image2Text and relation modeling modules. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsContrastive Language-Image Pre-training