Characterizing the hyper-parameter space of LSTM language models for   mixed context applications

Victor Akinwande; Sekou L. Remy

arXiv:1712.03199·cs.CL·December 11, 2017

Characterizing the hyper-parameter space of LSTM language models for mixed context applications

Victor Akinwande, Sekou L. Remy

PDF

Open Access

TL;DR

This paper investigates the sensitivity of LSTM language models' hyperparameters when applied to a new code-mixed dataset, highlighting their robustness and implications for reproducibility in real-world applications.

Contribution

The study provides a detailed characterization of LSTM hyper-parameter sensitivity on a novel code-mixed corpus, revealing minimal sensitivity for most parameters.

Findings

01

Most hyperparameters show minimal sensitivity to the new dataset.

02

Certain hyperparameters significantly affect model performance.

03

Results inform best practices for hyper-parameter tuning in real-world scenarios.

Abstract

Applying state of the art deep learning models to novel real world datasets gives a practical evaluation of the generalizability of these models. Of importance in this process is how sensitive the hyper parameters of such models are to novel datasets as this would affect the reproducibility of a model. We present work to characterize the hyper parameter space of an LSTM for language modeling on a code-mixed corpus. We observe that the evaluated model shows minimal sensitivity to our novel dataset bar a few hyper parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory