Semantic Prediction: Which One Should Come First, Recognition or   Prediction?

Hafez Farazi; Jan Nogga; and Sven Behnke

arXiv:2110.02829·cs.CV·October 7, 2021

Semantic Prediction: Which One Should Come First, Recognition or Prediction?

Hafez Farazi, Jan Nogga, and Sven Behnke

PDF

1 Repo

TL;DR

This paper explores whether semantic extraction should precede or follow video prediction to improve scene understanding, using LFDTN and U-Net models on synthetic and real datasets.

Contribution

It systematically compares the two configurations of semantic prediction and demonstrates their impact on scene understanding tasks.

Findings

01

Semantic extraction before prediction can enhance scene interpretation.

02

The order of prediction and semantics extraction affects downstream task performance.

03

Empirical evaluation on datasets shows the advantages of the proposed approach.

Abstract

The ultimate goal of video prediction is not forecasting future pixel-values given some previous frames. Rather, the end goal of video prediction is to discover valuable internal representations from the vast amount of available unlabeled video data in a self-supervised fashion for downstream tasks. One of the primary downstream tasks is interpreting the scene's semantic composition and using it for decision-making. For example, by predicting human movements, an observer can anticipate human activities and collaborate in a shared workspace. There are two main ways to achieve the same outcome, given a pre-trained video prediction and pre-trained semantic extraction model; one can first apply predictions and then extract semantics or first extract semantics and then predict. We investigate these configurations using the Local Frequency Domain Transformer Network (LFDTN) as the video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ais-bonn/pred_semantic
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Convolution · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing