VideoLT: Large-scale Long-tailed Video Recognition

Xing Zhang; Zuxuan Wu; Zejia Weng; Huazhu Fu; Jingjing Chen; Yu-Gang; Jiang; Larry Davis

arXiv:2105.02668·cs.CV·August 19, 2021

VideoLT: Large-scale Long-tailed Video Recognition

Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang, Jiang, Larry Davis

PDF

1 Repo

TL;DR

VideoLT introduces a large-scale, long-tailed video dataset and demonstrates that existing image-based methods underperform on videos, leading to the development of FrameStack, a dynamic frame sampling technique that improves long-tailed video recognition.

Contribution

The paper presents VideoLT, a new long-tailed video dataset, and proposes FrameStack, a novel frame sampling method tailored for long-tailed video recognition.

Findings

01

State-of-the-art image long-tailed methods underperform on videos.

02

FrameStack improves classification accuracy in long-tailed video datasets.

03

Dynamic frame sampling balances class distribution effectively.

Abstract

Label distributions in real-world are oftentimes long-tailed and imbalanced, resulting in biased models towards dominant labels. While long-tailed recognition has been extensively studied for image classification tasks, limited effort has been made for video domain. In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition. Our VideoLT contains 256,218 untrimmed videos, annotated into 1,004 classes with a long-tailed distribution. Through extensive studies, we demonstrate that state-of-the-art methods used for long-tailed image recognition do not perform well in the video domain due to the additional temporal dimension in video data. This motivates us to propose FrameStack, a simple yet effective method for long-tailed video recognition task. In particular, FrameStack performs sampling at the frame-level in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

17Skye17/VideoLT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.