# Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

**Authors:** Julia Kreutzer, Artem Sokolov, Stefan Riezler

arXiv: 1704.06497 · 2018-12-14

## TL;DR

This paper extends bandit structured prediction to neural sequence-to-sequence models, enabling learning from partial feedback and demonstrating significant improvements in neural machine translation tasks.

## Contribution

It introduces a novel approach combining bandit learning with neural sequence models and incorporates control variates for variance reduction.

## Key findings

- Up to 5.89 BLEU points improvement in domain adaptation
- Effective variance reduction with control variates
- Successful application to neural machine translation

## Abstract

Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attention-based recurrent neural networks. Furthermore, we show how to incorporate control variates into our learning algorithms for variance reduction and improved generalization. We present an evaluation on a neural machine translation task that shows improvements of up to 5.89 BLEU points for domain adaptation from simulated bandit feedback.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.06497/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1704.06497/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/1704.06497/full.md

---
Source: https://tomesphere.com/paper/1704.06497