Differentiable Combinatorial Losses through Generalized Gradients of Linear Programs
Xi Gao, Han Zhang, Aliakbar Panahi, Tom Arodz

TL;DR
This paper introduces a method for differentiating through combinatorial optimization problems, like sequence alignment and classification, enabling end-to-end training with structured objectives.
Contribution
It presents a way to perform gradient descent on combinatorial algorithms expressed as linear programs, bridging the gap between training objectives and inference goals.
Findings
Effective sequence-to-sequence training with differentiable alignment.
Improved weakly supervised image classification results.
Demonstrated efficiency of gradient-based optimization over combinatorial problems.
Abstract
When samples have internal structure, we often see a mismatch between the objective optimized during training and the model's goal during inference. For example, in sequence-to-sequence modeling we are interested in high-quality translated sentences, but training typically uses maximum likelihood at the word level. The natural training-time loss would involve a combinatorial problem -- dynamic programming-based global sequence alignment -- but solutions to combinatorial problems are not differentiable with respect to their input parameters, so surrogate, differentiable losses are used instead. Here, we show how to perform gradient descent over combinatorial optimization algorithms that involve continuous parameters, for example edge weights, and can be efficiently expressed as linear programs. We demonstrate usefulness of gradient descent over combinatorial optimization in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSoftmax
