(De)-Indexing and the Right to be Forgotten

Salvatore Vilella; Giancarlo Ruffo

arXiv:2501.03989·cs.CY·January 8, 2025

(De)-Indexing and the Right to be Forgotten

Salvatore Vilella, Giancarlo Ruffo

PDF

Open Access

TL;DR

This paper explores the technical challenges of implementing the right to be forgotten in search engines, discussing IR models and the potential role of LLMs in managing personal data removal.

Contribution

It provides an accessible overview of IR concepts and models relevant to de-indexing and the right to be forgotten, highlighting current challenges and future directions.

Findings

01

IR models are fundamental to de-indexing processes

02

LLMs can enhance data processing for RTBF

03

Balancing privacy and search efficiency remains complex

Abstract

In the digital age, the challenge of forgetfulness has emerged as a significant concern, particularly regarding the management of personal data and its accessibility online. The right to be forgotten (RTBF) allows individuals to request the removal of outdated or harmful information from public access, yet implementing this right poses substantial technical difficulties for search engines. This paper aims to introduce non-experts to the foundational concepts of information retrieval (IR) and de-indexing, which are critical for understanding how search engines can effectively "forget" certain content. We will explore various IR models, including boolean, probabilistic, vector space, and embedding-based approaches, as well as the role of Large Language Models (LLMs) in enhancing data processing capabilities. By providing this overview, we seek to highlight the complexities involved in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Humanities and Scholarship