Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics
Alexander Naumann, Felix Hertlein, Benchun Zhou, Laura D\"orr, and Kai Furmans

TL;DR
This paper introduces an automated pipeline for generating synthetic datasets for instance segmentation, reducing reliance on real-world data and improving transferability to real images, demonstrated on parcel logistics.
Contribution
The authors present a comprehensive, fully automated dataset generation pipeline from image scraping to composition, including novel insights on image selection and blending methods for domain adaptation.
Findings
Successful transfer to real test images with Mask AP 86.2
Image selection broadening helps bridge domain gap
Blending methods outperform simple copy-and-paste
Abstract
State-of-the-art approaches in computer vision heavily rely on sufficiently large training datasets. For real-world applications, obtaining such a dataset is usually a tedious task. In this paper, we present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps. In contrast to existing work, our pipeline covers every step from data acquisition to the final dataset. We first scrape images for the objects of interest from popular image search engines and since we rely only on text-based queries the resulting data comprises a wide variety of images. Hence, image selection is necessary as a second step. This approach of image scraping and selection relaxes the need for a real-world domain-specific dataset that must be either publicly available or created for this purpose. We employ an object-agnostic background removal model and compare three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsTest
