HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment
Anssi Yli-Jyr\"a, Josi Purhonen, Matti Liljeqvist, Arto, Antturi, Pekka Nieminen, Kari M. R\"antil\"a, Valtter Luoto

TL;DR
This paper presents HELFI, an openly available Hebrew-Greek-Finnish Bible corpus with detailed cross-lingual morpheme alignments, created through a process that avoids proprietary resources.
Contribution
It introduces a novel methodology for reconstructing a morphologically aligned multilingual Bible corpus using only free resources.
Findings
Created an open Hebrew-Greek-Finnish Bible corpus with morpheme alignments
Demonstrated a process to avoid proprietary resources in corpus creation
Produced a resource useful for cross-lingual and morphological studies
Abstract
Twenty-five years ago, morphologically aligned Hebrew-Finnish and Greek-Finnish bitexts (texts accompanied by a translation) were constructed manually in order to create an analytical concordance (Luoto et al., 1997) for a Finnish Bible translation. The creators of the bitexts recently secured the publisher's permission to release its fine-grained alignment, but the alignment was still dependent on proprietary, third-party resources such as a copyrighted text edition and proprietary morphological analyses of the source texts. In this paper, we describe a nontrivial editorial process starting from the creation of the original one-purpose database and ending with its reconstruction using only freely available text editions and annotations. This process produced an openly available dataset that contains (i) the source texts and their translations, (ii) the morphological analyses, (iii) the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
