Better Embeddings with Coupled Adam
Felix Stollenwerk, Tobias Stollenwerk

TL;DR
This paper introduces Coupled Adam, a modified optimizer designed to reduce anisotropy in word embeddings learned by large language models, resulting in improved embedding quality and downstream task performance.
Contribution
The paper identifies the second moment in Adam as a cause of anisotropic embeddings and proposes Coupled Adam to address this issue, enhancing embedding and task performance.
Findings
Coupled Adam reduces anisotropy in embeddings.
Improved downstream task performance with Coupled Adam.
Significant enhancement in embedding quality.
Abstract
Despite their remarkable capabilities, LLMs learn word representations that exhibit the undesirable yet poorly understood feature of anisotropy. In this paper, we argue that the second moment in Adam is a cause of anisotropic embeddings, and suggest a modified optimizer called Coupled Adam to mitigate the problem. Our experiments demonstrate that Coupled Adam significantly improves the quality of embeddings, while also leading to better upstream and downstream performance on large enough datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Face recognition and analysis
MethodsAdam
