VideoSAM: A Large Vision Foundation Model for High-Speed Video   Segmentation

Chika Maduabuchi; Ericmoore Jossou; Matteo Bucci

arXiv:2410.21304·cs.CV·February 7, 2025

VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation

Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci

PDF

Open Access 1 Repo

TL;DR

VideoSAM is a fine-tuned vision model that significantly improves high-speed video segmentation accuracy across various fluid environments, surpassing traditional models like U-Net.

Contribution

We introduce VideoSAM, a specialized adaptation of SAM for HSV segmentation, along with an open-source dataset, advancing the robustness and accuracy of phase detection in scientific videos.

Findings

01

VideoSAM outperforms U-Net in complex segmentation tasks.

02

The model generalizes well across different fluid environments.

03

Open-source dataset facilitates future research in HSV segmentation.

Abstract

High-speed video (HSV) segmentation is essential for analyzing dynamic physical processes in scientific and industrial applications, such as boiling heat transfer. Existing models like U-Net struggle with generalization and accurately segmenting complex bubble formations. We present VideoSAM, a specialized adaptation of the Segment Anything Model (SAM), fine-tuned on a diverse HSV dataset for phase detection. Through diverse experiments, VideoSAM demonstrates superior performance across four fluid environments -- Water, FC-72, Nitrogen, and Argon -- significantly outperforming U-Net in complex segmentation tasks. In addition to introducing VideoSAM, we contribute an open-source HSV segmentation dataset designed for phase detection, enabling future research in this domain. Our findings underscore VideoSAM's potential to set new standards in robust and accurate HSV segmentation. The code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chikap421/videosam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques · Advanced Neural Network Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · U-Net · Sparse Evolutionary Training