Quantifying Orphaned Annotations in Hypothes.is
Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

TL;DR
This paper analyzes the prevalence of orphaned annotations in Hypothes.is, revealing that a significant portion of annotations are lost or at risk due to web page changes and limited archiving, highlighting the need for better archiving practices.
Contribution
It provides the first large-scale quantification of orphaned annotations in Hypothes.is and emphasizes the importance of archiving web pages at annotation creation.
Findings
22% of annotations are no longer attachable to live pages
Only 12% of orphaned annotations can be recovered via web archives
53% of attached annotations risk becoming orphans due to page changes
Abstract
Web annotation has been receiving increased attention recently with the organization of the Open Annotation Collaboration and new tools for open annotation, such as Hypothes.is. We investigate the prevalence of orphaned annotations, where neither the live Web page nor an archived copy of the Web page contains the text that had previously been annotated in the Hypothes.is annotation system (containing 20,953 highlighted text annotations). We found that about 22% of highlighted text annotations can no longer be attached to their live Web pages. Unfortunately, only about 12% of these annotations can be reattached using the holdings of current public web archives, leaving the remaining 88% of these annotations orphaned. For those annotations that are still attached, 53% are in danger of becoming orphans if the live Web page changes. This points to the need for archiving the target of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Topic Modeling · Natural Language Processing Techniques
