A Reproducible Workflow for Scraping, Structuring, and Segmenting Legacy Archaeological Artifact Images
Juan Palomeque-Gonzalez

TL;DR
This paper introduces a reproducible workflow combining web scraping and classical computer vision techniques to convert legacy archaeological images into structured, segmentation-ready datasets, enhancing digital archaeology research.
Contribution
It presents open source tools for automated image retrieval and processing, enabling structured datasets from legacy collections without redistributing original images.
Findings
Successfully retrieved and processed thousands of archaeological images
Generated segmentation masks and annotations compatible with machine learning workflows
Facilitated reproducible and ethical data handling in digital archaeology
Abstract
This technical note presents a reproducible workflow for converting a legacy archaeological image collection into a structured and segmentation ready dataset. The case study focuses on the Lower Palaeolithic hand axe and biface collection curated by the Archaeology Data Service (ADS), a dataset that provides thousands of standardised photographs but no mechanism for bulk download or automated processing. To address this, two open source tools were developed: a web scraping script that retrieves all record pages, extracts associated metadata, and downloads the available images while respecting ADS Terms of Use and ethical scraping guidelines; and an image processing pipeline that renames files using UUIDs, generates binary masks and bounding boxes through classical computer vision, and stores all derived information in a COCO compatible Json file enriched with archaeological metadata.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Pleistocene-Era Hominins and Archaeology · Archaeology and ancient environmental studies
