Prints in the Magnetic Dust: Robust Similarity Search in Legacy Media Images Using Checksum Count Vectors

Maciej Grzeszczuk; Kinga Skorupska; Grzegorz M. W\'ojcik

arXiv:2604.09657·cs.CV·April 14, 2026

Prints in the Magnetic Dust: Robust Similarity Search in Legacy Media Images Using Checksum Count Vectors

Maciej Grzeszczuk, Kinga Skorupska, Grzegorz M. W\'ojcik

PDF

TL;DR

This paper introduces a checksum-based feature representation for robust similarity search in legacy media images, enabling automated detection of duplicates and variants in damaged digital artifacts.

Contribution

It proposes a novel Checksum Count Vector method and demonstrates its effectiveness in identifying duplicates and variants in large, damaged digital media collections.

Findings

01

58% accuracy in detecting variants

02

97% accuracy in identifying alternative copies

03

Effective on recordings with up to 75% missing data

Abstract

Digitizing magnetic media containing computer data is only the first step towards the preservation of early home computing era artifacts. The audio tape images must be decoded, verified, repaired if necessary, tested, and documented. If parts of this process could be effectively automated, volunteers could focus on contributing contextual and historical knowledge rather than struggling with technical tools. We therefore propose a feature representation based on Checksum Count Vectors and evaluate its applicability to detecting duplicates and variants of recordings within a large data store. The approach was tested on a collection of decoded tape images (n=4902), achieving 58\% accuracy in detecting variants and 97% accuracy in identifying alternative copies, for damaged recordings with up to 75% of records missing. These results represent an important step towards fully automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.