Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

Adhiguna Kuncoro; Miguel Ballesteros; Lingpeng Kong; Chris Dyer; Noah; A. Smith

arXiv:1609.07561·cs.CL·September 27, 2016

Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Noah, A. Smith

PDF

1 Repo

TL;DR

This paper presents a novel approach to dependency parsing by combining ensemble methods with model distillation, resulting in a single, high-performing parser that surpasses previous state-of-the-art results across multiple languages.

Contribution

It introduces a new ensemble-based consensus parser and a distillation technique that incorporates ensemble uncertainty, advancing dependency parsing accuracy.

Findings

01

The ensemble consensus parser improves parsing accuracy.

02

The distilled parser outperforms previous state-of-the-art models.

03

Incorporating ensemble uncertainty enhances structured prediction.

Abstract

We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a "distillation" of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adhigunasurya/distillation_parser
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory