Project Pipeline: Preservation, Persistence, and Performance
Jane Greenberg, Christopher B. Rauch, Mat Kelly (Drexel University)

TL;DR
This paper presents a pipeline for transforming historical analog vocabularies into digital, linked data formats, implementing persistent identifiers, and integrating them into a system for research and analysis.
Contribution
It introduces a novel pipeline that converts 1910 LCSH into SKOS, implements PIDs with an ARK resolver, and integrates data into HIVE for scholarly analysis.
Findings
Successfully transformed 1910 LCSH to SKOS format
Implemented persistent identifiers with ARK resolver
Enabled automatic metadata generation in HIVE
Abstract
Preservation pipelines demonstrate extended value when digitized content is also computation ready. Expanding this to historical controlled vocabularies published in analog format requires additional steps if they are to be fully leveraged for research. This paper reports on work addressing this challenge. We report on a pipeline and project progress addressing three key goals: 1) transforming the 1910 Library of Congress Subject Headings (LCSH) to the Simple Knowledge Organization System (SKOS) linked data standard, 2) implementing persistent identifiers (PIDs) and launching our prototype ARK resolver, and 3) importing the 1910 LCSH into the Helping Interdisciplinary Vocabulary Engineering (HIVE) System to support automatic metadata generation and scholarly analysis of the historical record. The discussion considers the implications of our work in the broader context of preservation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Digital Humanities and Scholarship · Digital and Traditional Archives Management
