Characterizing the hyper-parameter space of LSTM language models for mixed context applications
Victor Akinwande, Sekou L. Remy

TL;DR
This paper investigates the sensitivity of LSTM language models' hyperparameters when applied to a new code-mixed dataset, highlighting their robustness and implications for reproducibility in real-world applications.
Contribution
The study provides a detailed characterization of LSTM hyper-parameter sensitivity on a novel code-mixed corpus, revealing minimal sensitivity for most parameters.
Findings
Most hyperparameters show minimal sensitivity to the new dataset.
Certain hyperparameters significantly affect model performance.
Results inform best practices for hyper-parameter tuning in real-world scenarios.
Abstract
Applying state of the art deep learning models to novel real world datasets gives a practical evaluation of the generalizability of these models. Of importance in this process is how sensitive the hyper parameters of such models are to novel datasets as this would affect the reproducibility of a model. We present work to characterize the hyper parameter space of an LSTM for language modeling on a code-mixed corpus. We observe that the evaluated model shows minimal sensitivity to our novel dataset bar a few hyper parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
