Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation
Finlay G. C. Hudson, William A. P. Smith

TL;DR
This paper introduces TABE, a zero-shot amodal video object segmentation pipeline that leverages a pretrained video diffusion model and a single initial mask to perform occlusion-aware object tracking without additional training.
Contribution
The novel TABE pipeline enables zero-shot amodal segmentation using a pretrained diffusion model and a single initial mask, eliminating the need for class-specific training.
Findings
Effective amodal segmentation even with full occlusion
No re-training needed for new objects or classes
Outperforms existing methods in zero-shot scenarios
Abstract
We present Track Anything Behind Everything (TABE), a novel pipeline for zero-shot amodal video object segmentation. Unlike existing methods that require pretrained class labels, our approach uses a single query mask from the first frame where the object is visible, enabling flexible, zero-shot inference. We pose amodal segmentation as generative outpainting from modal (visible) masks using a pretrained video diffusion model. We do not need to re-train the diffusion model to accommodate additional input channels but instead use a pretrained model that we fine-tune at test-time to allow specialisation towards the tracked object. Our TABE pipeline is specifically designed to handle amodal completion, even in scenarios where objects are completely occluded. Our model and code will all be released.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
