Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code   Search

Andor Diera; Lukas Galke; Ansgar Scherp

arXiv:2411.17538·cs.CL·November 28, 2024

Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search

Andor Diera, Lukas Galke, Ansgar Scherp

PDF

Open Access 1 Repo

TL;DR

This paper investigates how the isotropy of embedding spaces affects semantic code search and introduces a Soft-ZCA whitening method to improve model performance by adjusting isotropy levels.

Contribution

It proposes a novel Soft-ZCA whitening technique to control isotropy in embeddings, enhancing semantic code search effectiveness beyond existing methods.

Findings

01

Soft-ZCA whitening improves search performance

02

Isotropy levels correlate with search effectiveness

03

Method complements contrastive fine-tuning

Abstract

Low isotropy in an embedding space impairs performance on tasks involving semantic inference. Our study investigates the impact of isotropy on semantic code search performance and explores post-processing techniques to mitigate this issue. We analyze various code language models, examine isotropy in their embedding spaces, and its influence on search effectiveness. We propose a modified ZCA whitening technique to control isotropy levels in embeddings. Our results demonstrate that Soft-ZCA whitening improves the performance of pre-trained code language models and can complement contrastive fine-tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

drndr/code_isotropy
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Modular Robots and Swarm Intelligence · Machine Learning in Materials Science

MethodsZCA Whitening