Soft-DTW: a Differentiable Loss Function for Time-Series
Marco Cuturi, Mathieu Blondel

TL;DR
This paper introduces soft-DTW, a differentiable variant of dynamic time warping, enabling effective learning and clustering of time series by leveraging a smooth, gradient-friendly loss function.
Contribution
The authors develop a differentiable soft-DTW loss function that allows gradient-based optimization for time series tasks, outperforming existing methods in clustering and parameter tuning.
Findings
Soft-DTW is differentiable and computationally efficient.
It improves clustering accuracy under DTW geometry.
It enables effective parameter tuning for time series models.
Abstract
We propose in this paper a differentiable learning loss between time series, building upon the celebrated dynamic time warping (DTW) discrepancy. Unlike the Euclidean distance, DTW can compare time series of variable size and is robust to shifts or dilatations across the time dimension. To compute DTW, one typically solves a minimal-cost alignment problem between two time series using dynamic programming. Our work takes advantage of a smoothed formulation of DTW, called soft-DTW, that computes the soft-minimum of all alignment costs. We show in this paper that soft-DTW is a differentiable loss function, and that both its value and gradient can be computed with quadratic time/space complexity (DTW has quadratic time but linear space complexity). We show that this regularization is particularly well suited to average and cluster time series under the DTW geometry, a task for which our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Music and Audio Processing · Advanced Text Analysis Techniques
MethodsDynamic Time Warping
