Memory-Efficient FastText: A Comprehensive Approach Using Double-Array Trie Structures and Mark-Compact Memory Management
Yimin Du

TL;DR
This paper introduces a memory-efficient FastText variant that employs double-array trie structures and mark-compact memory management, significantly reducing memory usage while preserving embedding quality for large vocabularies.
Contribution
It presents a novel memory optimization framework for FastText using DA-trie structures and similarity-based compression, enabling large-scale deployment with reduced memory footprint.
Findings
Achieved 4:1 to 10:1 compression ratios without quality loss.
Reduced memory from 100GB to 30GB on a 30-million Chinese vocabulary.
Demonstrated industrial deployment benefits including cost savings and faster loading.
Abstract
FastText has established itself as a fundamental algorithm for learning word representations, demonstrating exceptional capability in handling out-of-vocabulary words through character-level n-gram embeddings. However, its hash-based bucketing mechanism introduces critical limitations for large-scale industrial deployment: hash collisions cause semantic drift, and memory requirements become prohibitively expensive when dealing with real-world vocabularies containing millions of terms. This paper presents a comprehensive memory optimization framework that fundamentally reimagines FastText's memory management through the integration of double-array trie (DA-trie) structures and mark-compact garbage collection principles. Our approach leverages the linguistic insight that n-grams sharing common prefixes or suffixes exhibit highly correlated embeddings due to co-occurrence patterns in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
