Coarse-Fine Networks for Temporal Activity Detection in Videos

Kumara Kahatapitiya; Michael S. Ryoo

arXiv:2103.01302·cs.CV·April 2, 2021

Coarse-Fine Networks for Temporal Activity Detection in Videos

Kumara Kahatapitiya, Michael S. Ryoo

PDF

1 Repo

TL;DR

This paper presents Coarse-Fine Networks, a novel two-stream architecture that dynamically processes multiple temporal resolutions in videos, significantly improving long-term activity detection while reducing computational costs.

Contribution

Introduction of a two-stream architecture with learned temporal downsampling and multi-stage fusion for better video representations in activity detection.

Findings

01

Outperforms state-of-the-art on Charades dataset

02

Reduces compute and memory footprint

03

Effective in long-term motion analysis

Abstract

In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion. Traditional Video models process inputs at one (or few) fixed temporal resolution without any dynamic frame selection. However, we argue that, processing multiple temporal resolutions of the input and doing so dynamically by learning to estimate the importance of each frame can largely improve video representations, specially in the domain of temporal activity localization. To this end, we propose (1) Grid Pool, a learned temporal downsampling layer to extract coarse features, and, (2) Multi-stage Fusion, a spatio-temporal attention mechanism to fuse a fine-grained context with the coarse features. We show that our method outperforms the state-of-the-arts for action detection in public…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kkahatapitiya/Coarse-Fine-Networks
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.