Geometric Context from Videos
S. Hussain Raza, Matthias Grundmann, Irfan Essa

TL;DR
This paper introduces a new algorithm for estimating the 3D geometric structure of outdoor videos by combining segmentation and classification, achieving high accuracy with a semi-supervised learning approach.
Contribution
The paper presents a novel method for geometric scene understanding in videos, including a new dataset and a semi-supervised learning framework to improve predictions.
Findings
Achieved 96% accuracy in geometric classification.
Created an extensive dataset with over 20,000 annotated frames.
Developed a semi-supervised approach to expand labeled data.
Abstract
We present a novel algorithm for estimating the broad 3D geometric structure of outdoor video scenes. Leveraging spatio-temporal video segmentation, we decompose a dynamic scene captured by a video into geometric classes, based on predictions made by region-classifiers that are trained on appearance and motion features. By examining the homogeneity of the prediction, we combine predictions across multiple segmentation hierarchy levels alleviating the need to determine the granularity a priori. We built a novel, extensive dataset on geometric context of video to evaluate our method, consisting of over 100 ground-truth annotated outdoor videos with over 20,000 frames. To further scale beyond this dataset, we propose a semi-supervised learning framework to expand the pool of labeled data with high confidence predictions obtained from unlabeled data. Our system produces an accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
