HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Huiyin Xue; Nikolaos Aletras

arXiv:2210.07904·cs.CL·November 1, 2022

HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Huiyin Xue, Nikolaos Aletras

PDF

Open Access

TL;DR

HashFormers introduce a vocabulary-independent pre-trained transformer architecture that uses hashing to significantly reduce memory usage while maintaining competitive performance on text classification tasks.

Contribution

This work presents HashFormers, a novel pre-trained transformer model that employs hashing functions to eliminate the need for large embedding matrices, enabling unlimited vocabulary support.

Findings

01

HashFormers are more memory efficient than standard models.

02

They achieve comparable performance on text classification tasks.

03

The most efficient variant uses only 99.1K parameters with minimal performance loss.

Abstract

Transformer-based pre-trained language models are vocabulary-dependent, mapping by default each token to its corresponding embedding. This one-to-one mapping results into embedding matrices that occupy a lot of memory (i.e. millions of parameters) and grow linearly with the size of the vocabulary. Previous work on on-device transformers dynamically generate token embeddings on-the-fly without embedding matrices using locality-sensitive hashing over morphological information. These embeddings are subsequently fed into transformer layers for text classification. However, these methods are not pre-trained. Inspired by this line of work, we propose HashFormers, a new family of vocabulary-independent pre-trained transformers that support an unlimited vocabulary (i.e. all possible tokens in a corpus) given a substantially smaller fixed-sized embedding matrix. We achieve this by first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis