Empirical Gaussian priors for cross-lingual transfer learning

Anders S{\o}gaard

arXiv:1601.02166·cs.CL·January 12, 2016

Empirical Gaussian priors for cross-lingual transfer learning

Anders S{\o}gaard

PDF

Open Access

TL;DR

This paper introduces empirical Gaussian priors derived from source language models to improve cross-lingual POS tagging, reducing overfitting and enhancing transfer learning performance.

Contribution

It proposes a novel method of using source language models to estimate Gaussian priors, outperforming traditional regularization and model interpolation techniques.

Findings

01

Empirical Gaussian priors significantly improve transfer learning accuracy.

02

Drop-out with Gaussian noise further enhances model robustness.

03

Lower Rademacher complexity indicates better generalization.

Abstract

Sequence model learning algorithms typically maximize log-likelihood minus the norm of the model (or minimize Hamming loss + norm). In cross-lingual part-of-speech (POS) tagging, our target language training data consists of sequences of sentences with word-by-word labels projected from translations in $k$ languages for which we have labeled data, via word alignments. Our training data is therefore very noisy, and if Rademacher complexity is high, learning algorithms are prone to overfit. Norm-based regularization assumes a constant width and zero mean prior. We instead propose to use the $k$ source language models to estimate the parameters of a Gaussian prior for learning new POS taggers. This leads to significantly better performance in multi-source transfer set-ups. We also present a drop-out version that injects (empirical) Gaussian noise during online learning. Finally, we note…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis