Scaling Matters in Deep Structured-Prediction Models

Aleksandr Shevchenko; Anton Osokin

arXiv:1902.11088·cs.LG·March 1, 2019·1 cites

Scaling Matters in Deep Structured-Prediction Models

Aleksandr Shevchenko, Anton Osokin

PDF

Open Access

TL;DR

This paper investigates the challenges of joint training in deep structured-prediction models, proposing scaling algorithms to improve training stability and effectiveness across multiple tasks.

Contribution

It introduces online and offline scaling algorithms to address normalization issues, enabling successful end-to-end training of deep energy-based models.

Findings

01

Scaling algorithms improve joint training stability

02

Algorithms outperform multistage training approaches

03

Effective across diverse tasks

Abstract

Deep structured-prediction energy-based models combine the expressive power of learned representations and the ability of embedding knowledge about the task at hand into the system. A common way to learn parameters of such models consists in a multistage procedure where different combinations of components are trained at different stages. The joint end-to-end training of the whole system is then done as the last fine-tuning stage. This multistage approach is time-consuming and cumbersome as it requires multiple runs until convergence and multiple rounds of hyperparameter tuning. From this point of view, it is beneficial to start the joint training procedure from the beginning. However, such approaches often unexpectedly fail and deliver results worse than the multistage ones. In this paper, we hypothesize that one reason for joint training of deep energy-based models to fail is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Machine Learning and Data Classification