# Differentiable Scheduled Sampling for Credit Assignment

**Authors:** Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick

arXiv: 1704.06970 · 2017-04-25

## TL;DR

This paper introduces a differentiable approximation to greedy decoding in seq2seq models, enhancing scheduled sampling training to improve sequence prediction tasks like NER and translation.

## Contribution

It proposes a continuous relaxation of argmax for differentiable greedy decoding integrated into scheduled sampling, improving training effectiveness.

## Key findings

- Outperforms cross-entropy training in experiments.
- Achieves better results in named entity recognition.
- Enhances machine translation performance.

## Abstract

We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding for sequence-to-sequence (seq2seq) models. By incorporating this approximation into the scheduled sampling training procedure (Bengio et al., 2015)--a well-known technique for correcting exposure bias--we introduce a new training objective that is continuous and differentiable everywhere and that can provide informative gradients near points where previous decoding decisions change their value. In addition, by using a related approximation, we demonstrate a similar approach to sampled-based training. Finally, we show that our approach outperforms cross-entropy training and scheduled sampling procedures in two sequence prediction tasks: named entity recognition and machine translation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.06970/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1704.06970/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1704.06970/full.md

---
Source: https://tomesphere.com/paper/1704.06970