A Bayesian Model of Multilingual Unsupervised Semantic Role Induction
Nikhil Garg, James Henderson

TL;DR
This paper introduces a Bayesian model for unsupervised semantic role induction across multiple languages, leveraging parallel corpora and annotations to improve role alignment and induction accuracy.
Contribution
It presents a joint Bayesian framework that models multiple languages simultaneously and explores the impact of parallel data and annotations on semantic role induction.
Findings
Adding parallel corpora increases monolingual data benefits.
Alignments across languages yield small improvements.
More monolingual data has a larger impact than cross-lingual alignments.
Abstract
We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and use it to explore the usefulness of parallel corpora for this task. Our joint Bayesian model consists of individual models for each language plus additional latent variables that capture alignments between roles across languages. Because it is a generative Bayesian model, we can do evaluations in a variety of scenarios just by varying the inference procedure, without changing the model, thereby comparing the scenarios directly. We compare using only monolingual data, using a parallel corpus, using a parallel corpus with annotations in the other language, and using small amounts of annotation in the target language. We find that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
