Training a Computer Vision Model for Commercial Bakeries with Primarily Synthetic Images
Thomas H. Schmitt, Maximilian Bundscherer, Tobias Bocklet

TL;DR
This paper develops a computer vision system for bakery product detection using a large dataset and synthetic images, achieving high accuracy with state-of-the-art models.
Contribution
It introduces an expanded dataset and synthetic image generation to improve bakery product detection with YOLO models.
Findings
Achieved 90.3% [email protected] accuracy on test set.
Synthetic images effectively enhanced model robustness.
Extended previous work with more diverse data.
Abstract
In the food industry, reprocessing returned product is a vital step to increase resource efficiency. [SBB23] presented an AI application that automates the tracking of returned bread buns. We extend their work by creating an expanded dataset comprising 2432 images and a wider range of baked goods. To increase model robustness, we use generative models pix2pix and CycleGAN to create synthetic images. We train state-of-the-art object detection model YOLOv9 and YOLOv8 on our detection task. Our overall best-performing model achieved an average precision [email protected] of 90.3% on our test set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Industrial Vision Systems and Defect Detection
MethodsResidual Connection · Residual Block · Cycle Consistency Loss · Tanh Activation · HuMan(Expedia)||How do I get a human at Expedia? · PatchGAN · Dropout · GAN Least Squares Loss · Sigmoid Activation · Batch Normalization
