A Neural Network Approach for Mixing Language Models
Youssef Oualil, Dietrich Klakow

TL;DR
This paper introduces a novel neural network framework that combines multiple language models through feature and mixture layers, significantly improving perplexity without increasing model complexity.
Contribution
It proposes a new architecture for mixing heterogeneous neural language models, enhancing performance while maintaining efficiency.
Findings
Significant perplexity reduction on PTB and LTCB datasets.
Improved language modeling performance over existing architectures.
Efficient combination of models without increasing parameters or training time.
Abstract
The performance of Neural Network (NN)-based language models is steadily improving due to the emergence of new architectures, which are able to learn different natural language characteristics. This paper presents a novel framework, which shows that a significant improvement can be achieved by combining different existing heterogeneous models in a single architecture. This is done through 1) a feature layer, which separately learns different NN-based models and 2) a mixture layer, which merges the resulting model features. In doing so, this architecture benefits from the learning capabilities of each model with no noticeable increase in the number of model parameters or the training time. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
