ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus
Ayyoob Imani, Masoud Jalili Sabet, Philipp Dufter, Michael Cysouw,, Hinrich Sch\"utze

TL;DR
ParCourE is an online tool designed to explore a large multilingual parallel corpus, facilitating typological research and analysis across over 1300 languages, adaptable to other corpora.
Contribution
This paper introduces ParCourE, a versatile online platform for browsing and analyzing a massive multilingual corpus, aiding typological studies and resource creation.
Findings
Demonstrates usefulness for typological research.
Shows adaptability to different parallel corpora.
Provides insights into language similarities and properties.
Abstract
With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective. Researching typological properties of languages is fundamental for progress in multilingual NLP. Examples include assessing language similarity for effective transfer learning, injecting inductive biases into machine learning models or creating resources such as dictionaries and inflection tables. We provide ParCourE, an online tool that allows to browse a word-aligned parallel corpus, covering 1334 languages. We give evidence that this is useful for typological research. ParCourE can be set up for any parallel corpus and can thus be used for typological research on other corpora as well as for exploring their quality and properties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
