A Semi-Self-Supervised Approach for Dense-Pattern Video Object Segmentation
Keyhan Najafian, Farhad Maleki, Lingling Jin, Ian Stavness

TL;DR
This paper introduces a semi-self-supervised, diffusion-based method for dense video object segmentation, effectively reducing the need for extensive manual annotations in agricultural scenarios with densely packed, occluded objects.
Contribution
The proposed approach combines synthetic data and pseudo-labeling with diffusion models to improve dense VOS performance with minimal manual annotations.
Findings
Achieved a Dice score of 0.79 on drone-captured wheat head videos.
Effective in diverse agricultural settings and different growth stages.
Reduces reliance on large-scale manual annotations for dense object segmentation.
Abstract
Video object segmentation (VOS) -- predicting pixel-level regions for objects within each frame of a video -- is particularly challenging in agricultural scenarios, where videos of crops include hundreds of small, dense, and occluded objects (stems, leaves, flowers, pods) that sway and move unpredictably in the wind. Supervised training is the state-of-the-art for VOS, but it requires large, pixel-accurate, human-annotated videos, which are costly to produce for videos with many densely packed objects in each frame. To address these challenges, we proposed a semi-self-supervised spatiotemporal approach for dense-VOS (DVOS) using a diffusion-based method through multi-task (reconstruction and segmentation) learning. We train the model first with synthetic data that mimics the camera and object motion of real videos and then with pseudo-labeled videos. We evaluate our DVOS method for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Video Surveillance and Tracking Methods
MethodsVOS · Sparse Evolutionary Training
