SAM2Auto: Auto Annotation Using FLASH

Arash Rocky; Q.M. Jonathan Wu

arXiv:2506.07850·cs.CV·June 10, 2025

SAM2Auto: Auto Annotation Using FLASH

Arash Rocky, Q.M. Jonathan Wu

PDF

Open Access

TL;DR

SAM2Auto is an automated video annotation pipeline that combines robust object detection and real-time segmentation, significantly reducing manual effort and costs while maintaining high accuracy across diverse datasets.

Contribution

It introduces SAM2Auto, the first fully automated, dataset-agnostic video annotation system that eliminates human intervention and dataset-specific training.

Findings

01

Achieves annotation accuracy comparable to manual methods

02

Reduces annotation time and labor costs dramatically

03

Handles diverse datasets without retraining or extensive tuning

Abstract

Vision-Language Models (VLMs) lag behind Large Language Models due to the scarcity of annotated datasets, as creating paired visual-textual annotations is labor-intensive and expensive. To address this bottleneck, we introduce SAM2Auto, the first fully automated annotation pipeline for video datasets requiring no human intervention or dataset-specific training. Our approach consists of two key components: SMART-OD, a robust object detection system that combines automatic mask generation with open-world object detection capabilities, and FLASH (Frame-Level Annotation and Segmentation Handler), a multi-object real-time video instance segmentation (VIS) that maintains consistent object identification across video frames even with intermittent detection gaps. Unlike existing open-world detection methods that require frame-specific hyperparameter tuning and suffer from numerous false…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning