SynSpill: Improved Industrial Spill Detection With Synthetic Data

Aaditya Baranwal; Abdul Mueez; Jason Voelker; Guneet Bhatia; Shruti Vyas

arXiv:2508.10171·cs.CV·April 23, 2026

SynSpill: Improved Industrial Spill Detection With Synthetic Data

Aaditya Baranwal, Abdul Mueez, Jason Voelker, Guneet Bhatia, Shruti Vyas

PDF

2 Repos 1 Datasets

TL;DR

SynSpill introduces a synthetic data generation pipeline that enhances industrial spill detection by enabling effective fine-tuning of vision-language models and object detectors in data-scarce, safety-critical environments.

Contribution

The paper presents a scalable synthetic data framework that improves the performance of VLMs and detectors for industrial spill detection, addressing data scarcity and privacy concerns.

Findings

01

Synthetic data boosts VLM and detector performance in spill detection.

02

VLMs outperform detectors on unseen spill scenarios without synthetic data.

03

Combining synthetic data with lightweight fine-tuning yields comparable performance to real data.

Abstract

Large-scale Vision-Language Models (VLMs) have transformed general-purpose visual recognition through strong zero-shot capabilities. However, their performance degrades significantly in niche, safety-critical domains such as industrial spill detection, where hazardous events are rare, sensitive, and difficult to annotate. This scarcity -- driven by privacy concerns, data sensitivity, and the infrequency of real incidents -- renders conventional fine-tuning of detectors infeasible for most industrial settings. We address this challenge by introducing a scalable framework centered on a high-quality synthetic data generation pipeline. We demonstrate that this synthetic corpus enables effective Parameter-Efficient Fine-Tuning (PEFT) of VLMs and substantially boosts the performance of state-of-the-art object detectors such as YOLO and DETR. Notably, in the absence of synthetic data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

sochastic/SynSpill
dataset· 485 dl
485 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.