# Generative AI for Industrial Contour Detection: A Language-Guided Vision System

**Authors:** Liang Gong, Tommy (Zelin) Wang, Sara Chaker, Yanchen Dong, Fouad Bousetouane, Brenden Morton, Mark Mendez

arXiv: 2509.00284 · 2025-09-03

## TL;DR

This paper introduces a novel language-guided generative vision system for industrial contour detection, improving accuracy and reducing manual effort by integrating multimodal models and human-in-the-loop prompts.

## Contribution

It presents a three-stage system combining GAN-based contour generation and vision-language refinement, demonstrating superior performance over existing models on proprietary datasets.

## Key findings

- Enhanced contour fidelity and geometric alignment.
- GPT-image-1 outperformed Gemini 2.0 Flash in accuracy and quality.
- Reduced manual tracing in industrial contour detection.

## Abstract

Industrial computer vision systems often struggle with noise, material variability, and uncontrolled imaging conditions, limiting the effectiveness of classical edge detectors and handcrafted pipelines. In this work, we present a language-guided generative vision system for remnant contour detection in manufacturing, designed to achieve CAD-level precision. The system is organized into three stages: data acquisition and preprocessing, contour generation using a conditional GAN, and multimodal contour refinement through vision-language modeling, where standardized prompts are crafted in a human-in-the-loop process and applied through image-text guided synthesis. On proprietary FabTrack datasets, the proposed system improved contour fidelity, enhancing edge continuity and geometric alignment while reducing manual tracing. For the refinement stage, we benchmarked several vision-language models, including Google's Gemini 2.0 Flash, OpenAI's GPT-image-1 integrated within a VLM-guided workflow, and open-source baselines. Under standardized conditions, GPT-image-1 consistently outperformed Gemini 2.0 Flash in both structural accuracy and perceptual quality. These findings demonstrate the promise of VLM-guided generative workflows for advancing industrial computer vision beyond the limitations of classical pipelines.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00284/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00284/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/2509.00284/full.md

---
Source: https://tomesphere.com/paper/2509.00284