Propagating Semantic Labels in Video Data

David Balaban; Justin Medich; Pranay Gosar; Justin Hart

arXiv:2310.00783·cs.CV·October 3, 2023

Propagating Semantic Labels in Video Data

David Balaban, Justin Medich, Pranay Gosar, Justin Hart

PDF

Open Access

TL;DR

This paper introduces a method combining SAM and SfM to propagate semantic labels in video, reducing manual annotation effort by reprojecting 3D geometry for efficient segmentation across frames.

Contribution

The work presents a novel approach integrating SAM with SfM for efficient video object segmentation and label propagation, improving computational efficiency over manual annotation.

Findings

01

Substantial reduction in computation time compared to manual labeling.

02

System achieves reasonable mask IOU with manual labels.

03

Performance suffers in terms of tracking accuracy.

Abstract

Semantic Segmentation combines two sub-tasks: the identification of pixel-level image masks and the application of semantic labels to those masks. Recently, so-called Foundation Models have been introduced; general models trained on very large datasets which can be specialized and applied to more specific tasks. One such model, the Segment Anything Model (SAM), performs image segmentation. Semantic segmentation systems such as CLIPSeg and MaskRCNN are trained on datasets of paired segments and semantic labels. Manual labeling of custom data, however, is time-consuming. This work presents a method for performing segmentation for objects in video. Once an object has been found in a frame of video, the segment can then be propagated to future frames; thus reducing manual annotation effort. The method works by combining SAM with Structure from Motion (SfM). The video input to the system is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsSegment Anything Model