SAMannot: A Memory-Efficient, Local, Open-source Framework for Interactive Video Instance Segmentation based on SAM2

Gergely Dinya; Andr\'as Gelencs\'er; Krisztina Kup\'an; Clemens K\"upper; Krist\'of Karacs; Anna Gelencs\'er-Horv\'ath

arXiv:2601.11301·cs.CV·April 22, 2026

SAMannot: A Memory-Efficient, Local, Open-source Framework for Interactive Video Instance Segmentation based on SAM2

Gergely Dinya, Andr\'as Gelencs\'er, Krisztina Kup\'an, Clemens K\"upper, Krist\'of Karacs, Anna Gelencs\'er-Horv\'ath

PDF

TL;DR

SAMannot is an open-source, memory-efficient framework that enables high-fidelity, privacy-preserving video instance segmentation with human-in-the-loop interaction, suitable for research and complex annotation tasks.

Contribution

It introduces a modified SAM2 dependency and a processing layer to reduce resource requirements, enhancing responsiveness and usability for research workflows.

Findings

01

Verified on animal behavior tracking datasets.

02

Provides scalable, private, and cost-effective annotation.

03

Supports research-ready dataset generation in YOLO and PNG formats.

Abstract

Current research workflows for precise video segmentation are often forced into a compromise between labor-intensive manual curation, costly commercial platforms, and/or privacy-compromising cloud-based services. The demand for high-fidelity video instance segmentation in research is often hindered by the bottleneck of manual annotation and the privacy concerns of cloud-based tools. We present SAMannot, an open-source, local framework that integrates the Segment Anything Model 2 (SAM2) into a human-in-the-loop workflow. To address the high resource requirements of foundation models, we modified the SAM2 dependency and implemented a processing layer that minimizes computational overhead and maximizes throughput, ensuring a highly responsive user interface. Key features include persistent instance identity management, an automated ``lock-and-refine'' workflow with barrier frames, and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.