Minority-Oriented Vicinity Expansion with Attentive Aggregation for   Video Long-Tailed Recognition

WonJun Moon; Hyun Seok Seong; Jae-Pil Heo

arXiv:2211.13471·cs.CV·November 28, 2022

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel approach for Video Long-Tailed Recognition that leverages minority-oriented vicinity expansion and attentive aggregation to improve recognition accuracy across diverse and imbalanced video categories.

Contribution

The work proposes two learnable feature aggregators and a minority-oriented vicinity expansion method to address challenges in VLTR, such as task-irrelevant features and biased training.

Findings

01

Achieves state-of-the-art results on VideoLT and Imbalanced-MiniKinetics200 datasets.

02

18% and 58% relative improvements on head and tail classes with ResNet-50 features.

03

Effectively alleviates long-tailed distribution issues in video recognition.

Abstract

A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In this work, we summarize the challenges in VLTR and explore how to overcome them. The challenges are: (1) it is impractical to re-train the whole model for high-quality features, (2) acquiring frame-wise labels requires extensive cost, and (3) long-tailed data triggers biased training. Yet, most existing works for VLTR unavoidably utilize image-level features extracted from pretrained models which are task-irrelevant, and learn by video-level labels. Therefore, to deal with such (1) task-irrelevant features and (2) video-level labels, we introduce two complementary learnable feature aggregators. Learnable layers in each aggregator are to produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wjun0830/move
pytorchOfficial

Videos

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition· underline

Taxonomy

TopicsImage Enhancement Techniques · Retinal Imaging and Analysis · Domain Adaptation and Few-Shot Learning