Using stochastic computation graphs formalism for optimization of sequence-to-sequence model
Eugene Golikov, Vlad Zhukov, Maksim Kretov

TL;DR
This paper applies the stochastic computation graphs formalism to optimize sequence-to-sequence models with attention, offering a unified perspective that could inspire new architectures with embedded stochastic components.
Contribution
It introduces a reformulation of sequence-to-sequence model optimization using stochastic computation graphs, unifying various approaches and aiding future architecture development.
Findings
Provides a formal analysis of sequence-to-sequence optimization
Reformulates the optimization problem using SCG formalism
Offers examples in machine translation to illustrate the approach
Abstract
Variety of machine learning problems can be formulated as an optimization task for some (surrogate) loss function. Calculation of loss function can be viewed in terms of stochastic computation graphs (SCG). We use this formalism to analyze a problem of optimization of famous sequence-to-sequence model with attention and propose reformulation of the task. Examples are given for machine translation (MT). Our work provides a unified view on different optimization approaches for sequence-to-sequence models and could help researchers in developing new network architectures with embedded stochastic nodes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Algorithms · Advanced Graph Neural Networks
