Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent   Structure Learning

Tsvetomila Mihaylova; Vlad Niculae; Andr\'e F. T. Martins

arXiv:2010.02357·cs.CL·October 7, 2020

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Tsvetomila Mihaylova, Vlad Niculae, Andr\'e F. T. Martins

PDF

1 Repo

TL;DR

This paper investigates surrogate gradient methods, particularly SPIGOT, for training latent structure models in language processing, providing a new perspective, algorithms, and empirical insights into their effectiveness and failure modes.

Contribution

It offers a principled motivation for SPIGOT and related estimators, introduces new algorithms, and compares their performance with existing methods.

Findings

01

SPIGOT and STE can be derived from a pulled-back objective perspective.

02

New algorithms in the same family outperform some existing estimators.

03

Empirical results reveal failure cases and practical insights for structured latent models.

Abstract

Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deep-spin/understanding-spigot
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.