SynthPID: P&ID digitization from Topology-Preserving Synthetic Data
Suraj Prasad, Pinak Mahapatra

TL;DR
SynthPID introduces a large synthetic dataset of P&IDs with real topology, enabling effective digitization of diagrams without real data during training.
Contribution
The paper presents SynthPID, a synthetic P&ID dataset with topology seeded from real drawings, and a model that achieves high accuracy without real training data.
Findings
Synthetic data with real topology improves edge detection accuracy.
Model trained on SynthPID achieves 63.8% edge mAP without real data.
Performance plateaus beyond 400 synthetic images, highlighting seed diversity importance.
Abstract
Automating the digitization of Piping and Instrumentation Diagrams (P&IDs) into structured process graphs would unlock significant value in plant operations, yet progress is bottlenecked by a fundamental data problem: engineering drawings are proprietary, and the entire community shares a single public benchmark of just 12 annotated images. Prior attempts at synthetic augmentation have fallen short because template-based generators scatter symbols at random, producing graphs that bear little resemblance to real process plants and, accordingly, yield only approximately 33% edge detection accuracy under synth-only training. We argue the failure is structural rather than visual and address it by introducing SynthPID, a corpus of 665 synthetic P&IDs whose pipe topology is seeded directly from real drawings. Paired with a patch-based Relationformer adapted for high-resolution diagrams, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
