Using stochastic computation graphs formalism for optimization of   sequence-to-sequence model

Eugene Golikov; Vlad Zhukov; Maksim Kretov

arXiv:1711.07724·cs.LG·December 18, 2017

Using stochastic computation graphs formalism for optimization of sequence-to-sequence model

Eugene Golikov, Vlad Zhukov, Maksim Kretov

PDF

Open Access 1 Repo

TL;DR

This paper applies the stochastic computation graphs formalism to optimize sequence-to-sequence models with attention, offering a unified perspective that could inspire new architectures with embedded stochastic components.

Contribution

It introduces a reformulation of sequence-to-sequence model optimization using stochastic computation graphs, unifying various approaches and aiding future architecture development.

Findings

01

Provides a formal analysis of sequence-to-sequence optimization

02

Reformulates the optimization problem using SCG formalism

03

Offers examples in machine translation to illustrate the approach

Abstract

Variety of machine learning problems can be formulated as an optimization task for some (surrogate) loss function. Calculation of loss function can be viewed in terms of stochastic computation graphs (SCG). We use this formalism to analyze a problem of optimization of famous sequence-to-sequence model with attention and propose reformulation of the task. Examples are given for machine translation (MT). Our work provides a unified view on different optimization approaches for sequence-to-sequence models and could help researchers in developing new network architectures with embedded stochastic nodes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepmipt/seq2seq_scg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Algorithms · Advanced Graph Neural Networks