OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature

Alisha Srivastava; Emir Korukluoglu; Minh Nhat Le; Duyen Tran; Chau Minh Pham; Marzena Karpinska; Mohit Iyyer

arXiv:2505.22945·cs.CL·October 8, 2025

OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature

Alisha Srivastava, Emir Korukluoglu, Minh Nhat Le, Duyen Tran, Chau Minh Pham, Marzena Karpinska, Mohit Iyyer

PDF

Open Access 1 Repo 1 Video

TL;DR

This study introduces OWL, a multilingual dataset to evaluate how well large language models memorize and recall texts across different languages, revealing significant cross-lingual memorization capabilities.

Contribution

The paper presents OWL, a novel dataset for probing cross-lingual memorization in LLMs, and evaluates models' ability to recall aligned texts across multiple languages.

Findings

01

LLMs can recall memorized content across languages.

02

Perturbations slightly decrease recall accuracy.

03

GPT-4o identifies titles/authors 69% of the time.

Abstract

Large language models (LLMs) are known to memorize and recall English text from their pretraining data. However, the extent to which this ability generalizes to non-English languages or transfers across languages remains unclear. This paper investigates multilingual and cross-lingual memorization in LLMs, probing if memorized content in one language (e.g., English) can be recalled when presented in translation. To do so, we introduce OWL, a dataset of 31.5K aligned excerpts from 20 books in ten languages, including English originals, official translations (Vietnamese, Spanish, Turkish), and new translations in six low-resource languages (Sesotho, Yoruba, Maithili, Malagasy, Setswana, Tahitian). We evaluate memorization across model families and sizes through three tasks: (1) direct probing, which asks the model to identify a book's title and author; (2) name cloze, which requires…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emirkaan5/owl
noneOfficial

Videos

OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices