Structured Prediction: From Gaussian Perturbations to Linear-Time Principled Algorithms
Jean Honorio, Tommi Jaakkola

TL;DR
This paper introduces a PAC-Bayes framework analysis of a linear-time, sampling-based loss function for structured prediction, providing theoretical guarantees and explaining recent empirical successes.
Contribution
It offers a theoretical justification for using maximum loss over random outputs in structured prediction, showing it yields tighter bounds and is a principled learning method.
Findings
Tighter upper bounds on Gibbs decoder distortion with this loss.
The method is linear-time and easily parallelizable.
Theoretical insights explain empirical success of recent sampling-based approaches.
Abstract
Margin-based structured prediction commonly uses a maximum loss over all possible structured outputs \cite{Altun03,Collins04b,Taskar03}. In natural language processing, recent work \cite{Zhang14,Zhang15} has proposed the use of the maximum loss over random structured outputs sampled independently from some proposal distribution. This method is linear-time in the number of random structured outputs and trivially parallelizable. We study this family of loss functions in the PAC-Bayes framework under Gaussian perturbations \cite{McAllester07}. Under some technical conditions and up to statistical accuracy, we show that this family of loss functions produces a tighter upper bound of the Gibbs decoder distortion than commonly used methods. Thus, using the maximum loss over random structured outputs is a principled way of learning the parameter of structured prediction models. Besides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Topic Modeling
