Towards a Semantic Preservation System
Robert E. McGrath, Jason Kastner, Alejandro Rodriguez, Jim Myers

TL;DR
This paper discusses a system that preserves access to file content by extracting meaningful logical structures using a standard format description language, reducing long-term preservation costs.
Contribution
It introduces the Defuddle parser, a generic tool that utilizes DFDL to extract logical structures from various file formats, enhancing preservation capabilities.
Findings
Provides a universal approach to logical content extraction
Reduces software and maintenance costs for long-term preservation
Separates bits, formats, and logical content issues effectively
Abstract
Preserving access to file content requires preserving not just bits but also meaningful logical structures. The ongoing development of the Data Format Description Language (DFDL) is a completely general standard that addresses this need. The Defuddle parser is a generic parser that can use DFDL-style format descriptions to extract logical structures from ASCII or binary files written in those formats. DFDL and Defuddle provide a preservation capability that has minimal format-specific software and cleanly separates issues related to bits, formats, and logical content. Such a system has the potential to greatly reduce overall system development and maintenance costs as well as the per-file-format costs for long term preservation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Scientific Computing and Data Management · Research Data Management Practices
