Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions
Haoran Xu, Philipp Koehn

TL;DR
This paper proposes a context-aware, dictionary-free mapping method for cross-lingual BERT embeddings that improves bilingual dictionary induction by addressing anisotropy and anisometry through normalization, and explores sense-level embeddings for finer alignment.
Contribution
It introduces a novel context-aware mapping approach leveraging parallel corpora, and demonstrates the importance of normalization and sense-level embeddings for better cross-lingual alignment.
Findings
Contextual embedding space mapping outperforms previous methods on BDI.
Iterative normalization improves embedding space properties.
Sense-level embeddings yield more precise mappings.
Abstract
Typically, a linearly orthogonal transformation mapping is learned by aligning static type-level embeddings to build a shared semantic space. In view of the analysis that contextual embeddings contain richer semantic features, we investigate a context-aware and dictionary-free mapping approach by leveraging parallel corpora. We illustrate that our contextual embedding space mapping significantly outperforms previous multilingual word embedding methods on the bilingual dictionary induction (BDI) task by providing a higher degree of isomorphism. To improve the quality of mapping, we also explore sense-level embeddings that are split from type-level representations, which can align spaces in a finer resolution and yield more precise mapping. Moreover, we reveal that contextual embedding spaces suffer from their natural properties -- anisotropy and anisometry. To mitigate these two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
