Improving Joint Training of Inference Networks and Structured Prediction   Energy Networks

Lifu Tu; Richard Yuanzhe Pang; Kevin Gimpel

arXiv:1911.02891·cs.CL·October 13, 2020·1 cites

Improving Joint Training of Inference Networks and Structured Prediction Energy Networks

Lifu Tu, Richard Yuanzhe Pang, Kevin Gimpel

PDF

Open Access

TL;DR

This paper introduces strategies to stabilize and enhance joint training of energy-based models and inference networks for structured prediction, leading to improved performance on sequence labeling tasks.

Contribution

It proposes a compound training objective and joint parameterizations that enable more stable and effective learning of energy functions and inference networks.

Findings

01

Achieved stronger performance on sequence labeling tasks.

02

Demonstrated easier training process compared to prior methods.

03

Showed benefits of incorporating global energy terms.

Abstract

Deep energy-based models are powerful, but pose challenges for learning and inference (Belanger and McCallum, 2016). Tu and Gimpel (2018) developed an efficient framework for energy-based models by training "inference networks" to approximate structured inference instead of using gradient descent. However, their alternating optimization approach suffers from instabilities during training, requiring additional loss terms and careful hyperparameter tuning. In this paper, we contribute several strategies to stabilize and improve this joint training of energy functions and inference networks for structured prediction. We design a compound objective to jointly train both cost-augmented and test-time inference networks along with the energy function. We propose joint parameterizations for the inference networks that encourage them to capture complementary functionality during learning. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning