EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

Hyo Jin Jon; Longbin Jin; Eun Yi Kim

arXiv:2604.22595·cs.CV·April 27, 2026

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

Hyo Jin Jon, Longbin Jin, Eun Yi Kim

PDF

1 Repo

TL;DR

EV-CLIP is a novel, efficient framework that enhances few-shot video action recognition by addressing spatial perception challenges through visual prompts, outperforming existing methods across diverse scenes.

Contribution

Introduces EV-CLIP, combining mask and context prompts for spatial and temporal adaptation, with a comprehensive evaluation on multiple datasets.

Findings

01

EV-CLIP outperforms existing parameter-efficient methods.

02

Efficiency is independent of backbone scale.

03

Effective across diverse visual and semantic domain shifts.

Abstract

CLIP has demonstrated strong generalization in visual domains through natural language supervision, even for video action recognition. However, most existing approaches that adapt CLIP for action recognition have primarily focused on temporal modeling, often overlooking spatial perception. In real-world scenarios, visual challenges such as low-light environments or egocentric viewpoints can severely impair spatial understanding, an essential precursor for effective temporal reasoning. To address this limitation, we propose Efficient Visual Prompting for CLIP (EV-CLIP), an efficient adaptation framework designed for few-shot video action recognition across diverse scenes and viewpoints. EV-CLIP introduces two visual prompts: mask prompts, which guide the model's attention to action-relevant regions by reweighting pixels, and context prompts, which perform lightweight temporal modeling by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AI-CV-Lab/EV-CLIP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.