Differentiable Dynamic Programming for Structured Prediction and Attention
Arthur Mensch, Mathieu Blondel

TL;DR
This paper introduces a differentiable approach to dynamic programming by smoothing the max operator, enabling its integration into neural networks for structured prediction and attention mechanisms.
Contribution
It proposes a novel smoothing technique for DP algorithms, making them differentiable and applicable within neural network training via backpropagation.
Findings
Smoothed Viterbi algorithm for sequence prediction.
Smoothed DTW algorithm for time-series alignment.
Effective structured prediction and attention in neural machine translation.
Abstract
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Machine Learning and Algorithms
MethodsDynamic Time Warping
