Dense Semantic Forecasting in Video by Joint Regression of Features and   Feature Motion

Josip \v{S}ari\'c; Sacha Vra\v{z}i\'c; Sini\v{s}a \v{S}egvi\'c

arXiv:2101.10777·cs.CV·January 6, 2022

Dense Semantic Forecasting in Video by Joint Regression of Features and Feature Motion

Josip \v{S}ari\'c, Sacha Vra\v{z}i\'c, Sini\v{s}a \v{S}egvi\'c

PDF

TL;DR

This paper introduces a novel dense semantic forecasting method in video that predicts future pixel-level semantics by jointly regressing features and feature motion, achieving state-of-the-art accuracy across multiple dense prediction tasks.

Contribution

The paper presents a new joint regression approach combining feature and motion prediction, applicable to various architectures and tasks, with a decoupled, task-agnostic design.

Findings

01

Achieves state-of-the-art accuracy in semantic forecasting

02

Effective across semantic, instance, and panoptic segmentation tasks

03

Utilizes deformable convolutions and spatial correlation for improved predictions

Abstract

Dense semantic forecasting anticipates future events in video by inferring pixel-level semantics of an unobserved future image. We present a novel approach that is applicable to various single-frame architectures and tasks. Our approach consists of two modules. Feature-to-motion (F2M) module forecasts a dense deformation field that warps past features into their future positions. Feature-to-feature (F2F) module regresses the future features directly and is therefore able to account for emergent scenery. The compound F2MF model decouples the effects of motion from the effects of novelty in a task-agnostic manner. We aim to apply F2MF forecasting to the most subsampled and the most abstract representation of a desired single-frame model. Our design takes advantage of deformable convolutions and spatial correlation coefficients across neighbouring time instants. We perform experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.