Long-Term Feature Banks for Detailed Video Understanding

Chao-Yuan Wu; Christoph Feichtenhofer; Haoqi Fan; Kaiming He; Philipp; Kr\"ahenb\"uhl; Ross Girshick

arXiv:1812.05038·cs.CV·April 19, 2019·33 cites

Long-Term Feature Banks for Detailed Video Understanding

Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp, Kr\"ahenb\"uhl, Ross Girshick

PDF

Open Access 4 Repos

TL;DR

This paper introduces a long-term feature bank that enhances video models by providing context over entire videos, significantly improving performance on multiple challenging datasets.

Contribution

It proposes a novel long-term feature bank to augment existing video models, enabling better understanding through extended temporal context.

Findings

01

Achieved state-of-the-art results on AVA, EPIC-Kitchens, and Charades datasets.

02

Augmentation with the feature bank improves model performance.

03

Demonstrated effectiveness of long-term context in video understanding.

Abstract

To understand the world, we humans constantly need to relate the present to the past, and put events in context. In this paper, we enable existing video models to do the same. We propose a long-term feature bank---supportive information extracted over the entire span of a video---to augment state-of-the-art video models that otherwise would only view short clips of 2-5 seconds. Our experiments demonstrate that augmenting 3D convolutional networks with a long-term feature bank yields state-of-the-art results on three challenging video datasets: AVA, EPIC-Kitchens, and Charades.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging