Automated Video Segmentation Machine Learning Pipeline
Johannes Merz, Lucien Fostier

TL;DR
This paper introduces an automated, machine learning-based video segmentation pipeline that improves efficiency in visual effects production by providing consistent, accurate masks with minimal manual effort.
Contribution
It presents a novel integrated pipeline combining text-driven object detection, refined segmentation, and video tracking for VFX, enabling rapid and consistent mask generation.
Findings
Significantly reduces manual segmentation effort.
Speeds up preliminary compositing process.
Enhances temporal consistency of masks.
Abstract
Visual effects (VFX) production often struggles with slow, resource-intensive mask generation. This paper presents an automated video segmentation pipeline that creates temporally consistent instance masks. It employs machine learning for: (1) flexible object detection via text prompts, (2) refined per-frame image segmentation and (3) robust video tracking to ensure temporal stability. Deployed using containerization and leveraging a structured output format, the pipeline was quickly adopted by our artists. It significantly reduces manual effort, speeds up the creation of preliminary composites, and provides comprehensive segmentation data, thereby enhancing overall VFX production efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
