Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition
WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

TL;DR
This paper introduces a novel approach for Video Long-Tailed Recognition that leverages minority-oriented vicinity expansion and attentive aggregation to improve recognition accuracy across diverse and imbalanced video categories.
Contribution
The work proposes two learnable feature aggregators and a minority-oriented vicinity expansion method to address challenges in VLTR, such as task-irrelevant features and biased training.
Findings
Achieves state-of-the-art results on VideoLT and Imbalanced-MiniKinetics200 datasets.
18% and 58% relative improvements on head and tail classes with ResNet-50 features.
Effectively alleviates long-tailed distribution issues in video recognition.
Abstract
A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In this work, we summarize the challenges in VLTR and explore how to overcome them. The challenges are: (1) it is impractical to re-train the whole model for high-quality features, (2) acquiring frame-wise labels requires extensive cost, and (3) long-tailed data triggers biased training. Yet, most existing works for VLTR unavoidably utilize image-level features extracted from pretrained models which are task-irrelevant, and learn by video-level labels. Therefore, to deal with such (1) task-irrelevant features and (2) video-level labels, we introduce two complementary learnable feature aggregators. Learnable layers in each aggregator are to produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Enhancement Techniques · Retinal Imaging and Analysis · Domain Adaptation and Few-Shot Learning
