Submodular video object proposal selection for semantic object segmentation
Tinghuai Wang

TL;DR
This paper introduces a novel submodular optimization approach for selecting representative video object proposals, enhancing semantic segmentation by leveraging long-term contextual dependencies and reducing noise.
Contribution
It proposes a submodular function-based selection method for video object proposals that improves semantic segmentation accuracy by capturing long-term dependencies.
Findings
Outperforms state-of-the-art methods on a challenging dataset.
Effectively captures long-term contextual dependencies.
Reduces noise in object detection proposals.
Abstract
Learning a data-driven spatio-temporal semantic representation of the objects is the key to coherent and consistent labelling in video. This paper proposes to achieve semantic video object segmentation by learning a data-driven representation which captures the synergy of multiple instances from continuous frames. To prune the noisy detections, we exploit the rich information among multiple instances and select the discriminative and representative subset. This selection process is formulated as a facility location problem solved by maximising a submodular function. Our method retrieves the longer term contextual dependencies which underpins a robust semantic video object segmentation algorithm. We present extensive experiments on a challenging dataset that demonstrate the superior performance of our approach compared with the state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques · Image Retrieval and Classification Techniques
