Efficient Language Modeling for Low-Resource Settings with Hybrid   RNN-Transformer Architectures

Gabriel Lindenmaier; Sean Papay; Sebastian Pad\'o

arXiv:2502.00617·cs.CL·February 4, 2025

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

Gabriel Lindenmaier, Sean Papay, Sebastian Pad\'o

PDF

Open Access

TL;DR

This paper explores hybrid RNN-Transformer architectures to improve language modeling in low-resource settings, achieving better performance with fewer parameters and reduced computational costs.

Contribution

It introduces a novel hybrid architecture that replaces some attention layers with feed-forward and quasi-recurrent layers, enhancing efficiency in low-data regimes.

Findings

01

Outperforms existing models with similar parameter counts

02

Achieves comparable performance to larger models

03

Reduces training costs and model size

Abstract

Transformer-based language models have recently been at the forefront of active research in text generation. However, these models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute requirements measured in petaflop/s-decades. In this paper, we investigate transformer-based architectures for improving model performance in a low-data regime by selectively replacing attention layers with feed-forward and quasi-recurrent neural network layers. We test these architectures on the standard Enwik8 and Wikitext-103 corpora. Our results show that our reduced architectures outperform existing models with a comparable number of parameters, and obtain comparable performance to larger models while significantly reducing the number of parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling