Learning Approximate Inference Networks for Structured Prediction
Lifu Tu, Kevin Gimpel

TL;DR
This paper introduces a neural network-based inference method for structured prediction that replaces gradient descent, achieving significant speed-ups and improved accuracy in various structured prediction tasks.
Contribution
It proposes a neural inference network trained to approximate structured argmax inference, replacing gradient-based methods for faster and more accurate structured prediction.
Findings
10-60x speed-up in multi-label classification
Comparable speed and accuracy to exact inference in sequence labeling
Improved accuracy with label language model for long-distance dependencies
Abstract
Structured prediction energy networks (SPENs; Belanger & McCallum 2016) use neural network architectures to define energy functions that can capture arbitrary dependencies among parts of structured outputs. Prior work used gradient descent for inference, relaxing the structured output to a set of continuous variables and then optimizing the energy with respect to them. We replace this use of gradient descent with a neural network trained to approximate structured argmax inference. This "inference network" outputs continuous values that we treat as the output structure. We develop large-margin training criteria for joint training of the structured energy function and inference network. On multi-label classification we report speed-ups of 10-60x compared to (Belanger et al, 2017) while also improving accuracy. For sequence labeling with simple structured energies, our approach performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
