CaMEL: Case Marker Extraction without Labels

Leonie Weissweiler; Valentin Hofmann; Masoud Jalili Sabet; Hinrich; Sch\"utze

arXiv:2203.10010·cs.CL·March 29, 2022

CaMEL: Case Marker Extraction without Labels

Leonie Weissweiler, Valentin Hofmann, Masoud Jalili Sabet, Hinrich, Sch\"utze

PDF

Open Access 1 Repo

TL;DR

CaMEL is a new method for extracting case markers across 83 languages without labeled data, aiding linguistic analysis and low-resource language processing.

Contribution

It introduces the first model for CaMEL that leverages multilingual corpora and alignment to identify case markers without supervision.

Findings

01

Successfully extracted case markers in 83 languages

02

Constructed a silver standard from UniMorph for evaluation

03

Enabled analysis of cross-linguistic case system similarities

Abstract

We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leonieweissweiler/camel
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Language and cultural evolution · Speech and dialogue systems