Automated Video Segmentation Machine Learning Pipeline

Johannes Merz; Lucien Fostier

arXiv:2507.07242·cs.CV·July 11, 2025

Automated Video Segmentation Machine Learning Pipeline

Johannes Merz, Lucien Fostier

PDF

TL;DR

This paper introduces an automated, machine learning-based video segmentation pipeline that improves efficiency in visual effects production by providing consistent, accurate masks with minimal manual effort.

Contribution

It presents a novel integrated pipeline combining text-driven object detection, refined segmentation, and video tracking for VFX, enabling rapid and consistent mask generation.

Findings

01

Significantly reduces manual segmentation effort.

02

Speeds up preliminary compositing process.

03

Enhances temporal consistency of masks.

Abstract

Visual effects (VFX) production often struggles with slow, resource-intensive mask generation. This paper presents an automated video segmentation pipeline that creates temporally consistent instance masks. It employs machine learning for: (1) flexible object detection via text prompts, (2) refined per-frame image segmentation and (3) robust video tracking to ensure temporal stability. Deployed using containerization and leveraging a structured output format, the pipeline was quickly adopted by our artists. It significantly reduces manual effort, speeds up the creation of preliminary composites, and provides comprehensive segmentation data, thereby enhancing overall VFX production efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.