Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles
Ben Eyal, Michael Elhadad

TL;DR
This paper introduces a semi-automatically created Hebrew semantic role labeling resource derived from parallel movie subtitles, enabling improved Hebrew SRL modeling and providing a baseline for future research.
Contribution
It presents the first Hebrew SRL resource built via annotation projection from English subtitles, including comprehensive annotations and a baseline neural model.
Findings
Created a Hebrew SRL dataset from OpenSubtitles
Developed a baseline neural SRL model using multilingual BERT
Provided code adaptable to other languages for SRL resource creation
Abstract
We present a semantic role labeling resource for Hebrew built semi-automatically through annotation projection from English. This corpus is derived from the multilingual OpenSubtitles dataset and includes short informal sentences, for which reliable linguistic annotations have been computed. We provide a fully annotated version of the data including morphological analysis, dependency syntax and semantic role labeling in both FrameNet and PropBank styles. Sentences are aligned between English and Hebrew, both sides include full annotations and the explicit mapping from the English arguments to the Hebrew ones. We train a neural SRL model on this Hebrew resource exploiting the pre-trained multilingual BERT transformer model, and provide the first available baseline model for Hebrew SRL as a reference point. The code we provide is generic and can be adapted to other languages to bootstrap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Residual Connection · Label Smoothing
