Foundation Models for Amodal Video Instance Segmentation in Automated Driving
Jasmin Breitenstein, Franz J\"unger, Andreas B\"ar, Tim Fingscheidt

TL;DR
This paper introduces S-AModal, a novel approach leveraging foundation models and point memory to perform amodal video instance segmentation in automated driving, achieving state-of-the-art results without requiring amodal video labels.
Contribution
It proposes a fine-tuning method of the Segment Anything Model for amodal segmentation, using point prompts and memory to track instances across frames.
Findings
Achieves state-of-the-art amodal video instance segmentation results.
Reduces dependency on expensive amodal video labels.
Demonstrates effective point-based tracking with foundation models.
Abstract
In this work, we study amodal video instance segmentation for automated driving. Previous works perform amodal video instance segmentation relying on methods trained on entirely labeled video data with techniques borrowed from standard video instance segmentation. Such amodally labeled video data is difficult and expensive to obtain and the resulting methods suffer from a trade-off between instance segmentation and tracking performance. To largely solve this issue, we propose to study the application of foundation models for this task. More precisely, we exploit the extensive knowledge of the Segment Anything Model (SAM), while fine-tuning it to the amodal instance segmentation task. Given an initial video instance segmentation, we sample points from the visible masks to prompt our amodal SAM. We use a point memory to store those points. If a previously observed instance is not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsSegment Anything Model
