Training-Free Spatio-temporal Decoupled Reasoning Video Segmentation with Adaptive Object Memory
Zhengtong Zhu, Jiaqing Fan, Zhixuan Liu, Fanzhang Li

TL;DR
This paper introduces SDAM, a training-free, spatio-temporal decoupled reasoning framework for video segmentation that uses adaptive object memory to improve stability and accuracy across video sequences without fine-tuning.
Contribution
The paper proposes a novel training-free reasoning framework with an adaptive object memory module and spatio-temporal decoupling for stable, accurate video segmentation.
Findings
Outperforms fine-tuned methods on five benchmark datasets
Achieves precise spatial localization and segmentation
Provides stable cross-frame temporal propagation
Abstract
Reasoning Video Object Segmentation (ReasonVOS) is a challenging task that requires stable object segmentation across video sequences using implicit and complex textual inputs. Previous methods fine-tune Multimodal Large Language Models (MLLMs) to produce segmentation outputs, which demand substantial resources. Additionally, some existing methods are coupled in the processing of spatio-temporal information, which affects the temporal stability of the model to some extent. To address these issues, we propose Training-Free \textbf{S}patio-temporal \textbf{D}ecoupled Reasoning Video Segmentation with \textbf{A}daptive Object \textbf{M}emory (SDAM). We aim to design a training-free reasoning video segmentation framework that outperforms existing methods requiring fine-tuning, using only pre-trained models. Meanwhile, we propose an Adaptive Object Memory module that selects and memorizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Advanced Neural Network Applications
