Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks
Nils Reimers, Iryna Gurevych

TL;DR
This paper systematically evaluates over 50,000 hyperparameter configurations for deep LSTM networks across various sequence labeling tasks, identifying key parameters that significantly impact performance and providing optimized configuration recommendations.
Contribution
It offers a comprehensive analysis of hyperparameter importance for LSTM-based sequence labeling and proposes effective configuration guidelines based on extensive empirical evaluation.
Findings
Pre-trained word embeddings and last layer choice greatly affect performance.
Number of LSTM layers and units have minor impact.
Recommended configurations improve task performance across multiple NLP tasks.
Abstract
Selecting optimal parameters for a neural network architecture can often make the difference between mediocre and state-of-the-art performance. However, little is published which parameters and design choices should be evaluated or selected making the correct hyperparameter optimization often a "black art that requires expert experiences" (Snoek et al., 2012). In this paper, we evaluate the importance of different network design choices and hyperparameters for five common linguistic sequence tagging tasks (POS, Chunking, NER, Entity Recognition, and Event Detection). We evaluated over 50.000 different setups and found, that some parameters, like the pre-trained word embeddings or the last layer of the network, have a large impact on the performance, while other parameters, for example the number of LSTM layers or the number of recurrent units, are of minor importance. We give a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
