TL;DR
This paper introduces methods for reusing weights in subword-aware neural language models, improving performance and reducing model size, especially for morpheme-aware models, with a simple principle guiding weight tying.
Contribution
It presents novel techniques for weight reuse in subword-aware models and demonstrates their effectiveness in enhancing performance and reducing parameters.
Findings
Reusing weights can improve morpheme-aware models' performance.
Proper weight tying reduces model size by up to 87%.
Best morpheme-aware model outperforms word-level models across languages.
Abstract
We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
