Cluster Purge Loss: Structuring Transformer Embeddings for Equivalent Mutants Detection
Adelaide Danilov, Aria Nourbakhsh, Christoph Schommer

TL;DR
This paper introduces Cluster Purge Loss, a novel training framework that improves transformer embeddings for equivalent mutant detection by emphasizing intra-class semantic distinctions, leading to better performance and interpretability.
Contribution
The paper proposes Cluster Purge Loss, a new combined loss function that enhances embedding space structuring for code similarity tasks, specifically in detecting equivalent code mutants.
Findings
Achieves state-of-the-art results in mutant detection
Produces more interpretable embedding spaces
Outperforms traditional fine-tuning methods
Abstract
Recent pre-trained transformer models achieve superior performance in various code processing objectives. However, although effective at optimizing decision boundaries, common approaches for fine-tuning them for downstream classification tasks - distance-based methods or training an additional classification head - often fail to thoroughly structure the embedding space to reflect nuanced intra-class semantic relationships. Equivalent code mutant detection is one of these tasks, where the quality of the embedding space is crucial to the performance of the models. We introduce a novel framework that integrates cross-entropy loss with a deep metric learning objective, termed Cluster Purge Loss. This objective, unlike conventional approaches, concentrates on adjusting fine-grained differences within each class, encouraging the separation of instances based on semantical equivalency to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
