MajinBook: An open catalogue of digitally mediated world literature
Antoine Mazi\`eres, Thierry Poibeau

TL;DR
MajinBook is an open, high-precision catalogue linking shadow library metadata with bibliographic data to support computational social science and cultural analytics.
Contribution
It introduces a novel methodology for linking shadow library data with structured bibliographic sources, creating a large, enriched corpus of digitally mediated books.
Findings
Created a corpus of over 539,000 English-language book references
Enriched entries with publication dates, genres, and popularity metrics
Evaluated linkage accuracy and released data openly
Abstract
This data paper introduces MajinBook, an open catalogue designed to facilitate the use of shadow libraries-such as Library Genesis and Z-Library-for computational social science and cultural analytics. By linking metadata from these vast, crowd-sourced archives with structured bibliographic data from Goodreads, we create a high-precision corpus of over 539,000 references to digitally mediated English-language books. Spanning three centuries and reflecting a contemporary selection bias, these entries are enriched with first publication dates, genres, and popularity metrics like ratings and reviews. Our methodology prioritises natively digital EPUB files to ensure machine-readable quality, while addressing biases in traditional corpora like HathiTrust, and includes secondary datasets for French, German, and Spanish. We evaluate the linkage strategy for accuracy, release all underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
