Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search
Andor Diera, Lukas Galke, Ansgar Scherp

TL;DR
This paper investigates how the isotropy of embedding spaces affects semantic code search and introduces a Soft-ZCA whitening method to improve model performance by adjusting isotropy levels.
Contribution
It proposes a novel Soft-ZCA whitening technique to control isotropy in embeddings, enhancing semantic code search effectiveness beyond existing methods.
Findings
Soft-ZCA whitening improves search performance
Isotropy levels correlate with search effectiveness
Method complements contrastive fine-tuning
Abstract
Low isotropy in an embedding space impairs performance on tasks involving semantic inference. Our study investigates the impact of isotropy on semantic code search performance and explores post-processing techniques to mitigate this issue. We analyze various code language models, examine isotropy in their embedding spaces, and its influence on search effectiveness. We propose a modified ZCA whitening technique to control isotropy levels in embeddings. Our results demonstrate that Soft-ZCA whitening improves the performance of pre-trained code language models and can complement contrastive fine-tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Modular Robots and Swarm Intelligence · Machine Learning in Materials Science
MethodsZCA Whitening
