TL;DR
This paper introduces a mutual information maximization approach for part-of-speech induction, comparing two training objectives and demonstrating robustness and competitive performance across datasets.
Contribution
It proposes a novel generalization of Brown clustering and analyzes their robustness, achieving effective POS induction with simple models.
Findings
The variational lower bound is more robust to gradient noise.
The generalized Brown objective is vulnerable to noise.
The approach achieves competitive results across multiple datasets.
Abstract
We address part-of-speech (POS) induction by maximizing the mutual information between the induced label and its context. We focus on two training objectives that are amenable to stochastic gradient descent (SGD): a novel generalization of the classical Brown clustering objective and a recently proposed variational lower bound. While both objectives are subject to noise in gradient updates, we show through analysis and experiments that the variational lower bound is robust whereas the generalized Brown objective is vulnerable. We obtain competitive performance on a multitude of datasets and languages with a simple architecture that encodes morphology and context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
