A Neural Network Approach for Mixing Language Models

Youssef Oualil; Dietrich Klakow

arXiv:1708.06989·cs.CL·August 24, 2017

A Neural Network Approach for Mixing Language Models

Youssef Oualil, Dietrich Klakow

PDF

TL;DR

This paper introduces a novel neural network framework that combines multiple language models through feature and mixture layers, significantly improving perplexity without increasing model complexity.

Contribution

It proposes a new architecture for mixing heterogeneous neural language models, enhancing performance while maintaining efficiency.

Findings

01

Significant perplexity reduction on PTB and LTCB datasets.

02

Improved language modeling performance over existing architectures.

03

Efficient combination of models without increasing parameters or training time.

Abstract

The performance of Neural Network (NN)-based language models is steadily improving due to the emergence of new architectures, which are able to learn different natural language characteristics. This paper presents a novel framework, which shows that a significant improvement can be achieved by combining different existing heterogeneous models in a single architecture. This is done through 1) a feature layer, which separately learns different NN-based models and 2) a mixture layer, which merges the resulting model features. In doing so, this architecture benefits from the learning capabilities of each model with no noticeable increase in the number of model parameters or the training time. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.