Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling   Tasks

Nils Reimers; Iryna Gurevych

arXiv:1707.06799·cs.CL·August 17, 2017·265 cites

Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks

Nils Reimers, Iryna Gurevych

PDF

Open Access 5 Repos

TL;DR

This paper systematically evaluates over 50,000 hyperparameter configurations for deep LSTM networks across various sequence labeling tasks, identifying key parameters that significantly impact performance and providing optimized configuration recommendations.

Contribution

It offers a comprehensive analysis of hyperparameter importance for LSTM-based sequence labeling and proposes effective configuration guidelines based on extensive empirical evaluation.

Findings

01

Pre-trained word embeddings and last layer choice greatly affect performance.

02

Number of LSTM layers and units have minor impact.

03

Recommended configurations improve task performance across multiple NLP tasks.

Abstract

Selecting optimal parameters for a neural network architecture can often make the difference between mediocre and state-of-the-art performance. However, little is published which parameters and design choices should be evaluated or selected making the correct hyperparameter optimization often a "black art that requires expert experiences" (Snoek et al., 2012). In this paper, we evaluate the importance of different network design choices and hyperparameters for five common linguistic sequence tagging tasks (POS, Chunking, NER, Entity Recognition, and Event Detection). We evaluated over 50.000 different setups and found, that some parameters, like the pre-trained word embeddings or the last layer of the network, have a large impact on the performance, while other parameters, for example the number of LSTM layers or the number of recurrent units, are of minor importance. We give a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory