Mutual Information Maximization for Simple and Accurate Part-Of-Speech   Induction

Karl Stratos

arXiv:1804.07849·cs.CL·April 4, 2019

Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction

Karl Stratos

PDF

1 Repo

TL;DR

This paper introduces a mutual information maximization approach for part-of-speech induction, comparing two training objectives and demonstrating robustness and competitive performance across datasets.

Contribution

It proposes a novel generalization of Brown clustering and analyzes their robustness, achieving effective POS induction with simple models.

Findings

01

The variational lower bound is more robust to gradient noise.

02

The generalized Brown objective is vulnerable to noise.

03

The approach achieves competitive results across multiple datasets.

Abstract

We address part-of-speech (POS) induction by maximizing the mutual information between the induced label and its context. We focus on two training objectives that are amenable to stochastic gradient descent (SGD): a novel generalization of the classical Brown clustering objective and a recently proposed variational lower bound. While both objectives are subject to noise in gradient updates, we show through analysis and experiments that the variational lower bound is robust whereas the generalized Brown objective is vulnerable. We obtain competitive performance on a multitude of datasets and languages with a simple architecture that encodes morphology and context.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

karlstratos/mmi-tagger
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.