Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists
Himarsha R. Jayanetti, Shawn M. Jones, Martin Klein, Alex Osbourne,, Paul Koerbin, Michael L. Nelson, Michele C. Weigle

TL;DR
This paper reviews the diverse collection structures of eight web archive platforms, highlighting their features, differences, and implications for users, archivists, and developers in managing growing web archives.
Contribution
It provides a comprehensive comparison of collection concepts across multiple web archive platforms, informing better design and navigation of web archives.
Findings
Various collection structures support sub-collections and embargoes.
Collection management varies between single and multiple organizational control.
Understanding structures aids navigation, design, and tool development.
Abstract
As web archives' holdings grow, archivists subdivide them into collections so they are easier to understand and manage. In this work, we review the collection structures of eight web archive platforms: : Archive-It, Conifer, the Croatian Web Archive (HAW), the Internet Archive's user account web archives, Library of Congress (LC), PANDORA, Trove, and the UK Web Archive (UKWA). We note a plethora of different approaches to web archive collection structures. Some web archive collections support sub-collections and some permit embargoes. Curatorial decisions may be attributed to a single organization or many. Archived web pages are known by many names: mementos, copies, captures, or snapshots. Some platforms restrict a memento to a single collection and others allow mementos to cross collections. Knowledge of collection structures has implications for many different applications and users.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Advanced Data Storage Technologies · Caching and Content Delivery
