Full-Text and URL Search Over Web Archives

Miguel Costa

arXiv:2108.01603·cs.DL·August 4, 2021

Full-Text and URL Search Over Web Archives

Miguel Costa

PDF

1 Repo

TL;DR

This paper discusses the importance of full-text and URL search functionalities in web archives, highlighting their role in enabling effective access to historical web data through advanced search technologies.

Contribution

It introduces methods for implementing full-text and URL search over web archives, addressing the unique temporal challenges compared to modern web search engines.

Findings

01

Web archives are crucial for preserving societal history.

02

Full-text and URL search are essential for effective web archive access.

03

Temporal aspects introduce unique challenges in web archive search.

Abstract

Web archives are a historically valuable source of information. In some respects, web archives are the only record of the evolution of human society in the last two decades. They preserve a mix of personal and collective memories, the importance of which tends to grow as they age. However, the value of web archives depends on their users being able to search and access the information they require in efficient and effective ways. Without the possibility of exploring and exploiting the archived contents, web archives are useless. Web archive access functionalities range from basic browsing to advanced search and analytical services, accessed through user-friendly interfaces. Full-text and URL search have become the predominant and preferred forms of information discovery in web archives, fulfilling user needs and supporting search APIs that feed complex applications. Both full-text and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arquivo/pwa-technologies
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.