Targum -- A Multilingual New Testament Translation Corpus
Maciej Rapacz, Aleksander Smywi\'nski-Pohl

TL;DR
Targum introduces a comprehensive multilingual New Testament translation corpus with extensive metadata, enabling detailed linguistic and translation history analyses across five European languages.
Contribution
It provides the first multilingual corpus with deep translation depth per language, annotated with standardized identifiers for flexible, multilevel translation history research.
Findings
Contains 651 translations, 334 unique, across five languages.
Includes metadata for text standardization and analysis.
Enables micro and macro-level translation studies.
Abstract
Many European languages possess rich biblical translation histories, yet existing corpora - in prioritizing linguistic breadth - often fail to capture this depth. To address this gap, we introduce a multilingual corpus of 651 New Testament translations, of which 334 are unique, spanning five languages with 2.4-5.0x more translations per language than any prior corpus: English (194 unique versions from 390 total), French (41 from 78), Italian (17 from 33), Polish (29 from 48), and Spanish (53 from 102). Aggregated from 12 online biblical libraries and one preexisting corpus, each translation is annotated with metadata that maps the text to a standardized identifier for the work, its specific edition, and its year of revision. This canonicalization allows researchers to define "uniqueness" for their own needs: they can perform micro-level analyses on translation families, such as the KJV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
