Reusing Weights in Subword-aware Neural Language Models

Zhenisbek Assylbekov; Rustem Takhanov

arXiv:1802.08375·cs.CL·April 26, 2018

Reusing Weights in Subword-aware Neural Language Models

Zhenisbek Assylbekov, Rustem Takhanov

PDF

1 Repo

TL;DR

This paper introduces methods for reusing weights in subword-aware neural language models, improving performance and reducing model size, especially for morpheme-aware models, with a simple principle guiding weight tying.

Contribution

It presents novel techniques for weight reuse in subword-aware models and demonstrates their effectiveness in enhancing performance and reducing parameters.

Findings

01

Reusing weights can improve morpheme-aware models' performance.

02

Proper weight tying reduces model size by up to 87%.

03

Best morpheme-aware model outperforms word-level models across languages.

Abstract

We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zh3nis/morph-sum
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.