CaMEL: Case Marker Extraction without Labels
Leonie Weissweiler, Valentin Hofmann, Masoud Jalili Sabet, Hinrich, Sch\"utze

TL;DR
CaMEL is a new method for extracting case markers across 83 languages without labeled data, aiding linguistic analysis and low-resource language processing.
Contribution
It introduces the first model for CaMEL that leverages multilingual corpora and alignment to identify case markers without supervision.
Findings
Successfully extracted case markers in 83 languages
Constructed a silver standard from UniMorph for evaluation
Enabled analysis of cross-linguistic case system similarities
Abstract
We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Speech and dialogue systems
