How Universal is Genre in Universal Dependencies?
Max M\"uller-Eberstein, Rob van der Goot, Barbara Plank

TL;DR
This paper investigates the diversity of genres in Universal Dependencies, proposing weakly supervised methods to predict genre at the instance level, revealing challenges in using genre metadata for treebank selection across 114 languages.
Contribution
It introduces four novel weak supervision methods for instance-level genre prediction in UD, improving over baselines and analyzing genre metadata's limitations.
Findings
Proposed methods outperform baselines in genre prediction.
Genre metadata alone is noisy and requires disentanglement.
Analysis highlights challenges in universal genre application.
Abstract
This work provides the first in-depth analysis of genre in Universal Dependencies (UD). In contrast to prior work on genre identification which uses small sets of well-defined labels in mono-/bilingual setups, UD contains 18 genres with varying degrees of specificity spread across 114 languages. As most treebanks are labeled with multiple genres while lacking annotations about which instances belong to which genre, we propose four methods for predicting instance-level genre using weak supervision from treebank metadata. The proposed methods recover instance-level genre better than competitive baselines as measured on a subset of UD with labeled instances and adhere better to the global expected distribution. Our analysis sheds light on prior work using UD genre metadata for treebank selection, finding that metadata alone are a noisy signal and must be disentangled within treebanks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Text Readability and Simplification
