Frequency-aware Dimension Selection for Static Word Embedding by Mixed Product Distance
Lingfeng Shen, Haiyun Jiang, Lemao Liu, Ying Chen

TL;DR
This paper introduces a frequency-aware method for selecting the optimal dimension in static word embeddings using the Mixed Product Distance metric, improving efficiency and performance without additional training.
Contribution
It proposes a novel MPD-based dimension selection technique that accounts for word frequency, enhancing static embedding quality without training new models.
Findings
MPD-based method outperforms baselines in efficiency and accuracy
Word frequency significantly influences dimension selection
Post-processing reduces frequency bias in embeddings
Abstract
Static word embedding is still useful, particularly for context-unavailable tasks, because in the case of no context available, pre-trained language models often perform worse than static word embeddings. Although dimension is a key factor determining the quality of static word embeddings, automatic dimension selection is rarely discussed. In this paper, we investigate the impact of word frequency on the dimension selection, and empirically find that word frequency is so vital that it needs to be taken into account during dimension selection. Based on such an empirical finding, this paper proposes a dimension selection method that uses a metric (Mixed Product Distance, MPD) to select a proper dimension for word embedding algorithms without training any word embedding. Through applying a post-processing function to oracle matrices, the MPD-based method can de-emphasize the impact of word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
