Complexity-based code embeddings

Rares Folea; Radu Iacob; Emil Slusanschi; Traian Rebedea

arXiv:2601.00924·cs.LG·January 6, 2026

Complexity-based code embeddings

Rares Folea, Radu Iacob, Emil Slusanschi, Traian Rebedea

PDF

Open Access

TL;DR

This paper introduces a method to convert source code into numerical embeddings using complexity analysis, enabling improved machine learning performance on code classification tasks.

Contribution

It proposes a novel complexity-based code embedding technique and demonstrates its effectiveness with an XGBoost classifier on real-world programming competition data.

Findings

01

Achieved high F1-score on multi-label code classification

02

Demonstrated the effectiveness of complexity-based embeddings

03

Provided a general framework for code representation

Abstract

This paper presents a generic method for transforming the source code of various algorithms to numerical embeddings, by dynamically analysing the behaviour of computer programs against different inputs and by tailoring multiple generic complexity functions for the analysed metrics. The used algorithms embeddings are based on r-Complexity . Using the proposed code embeddings, we present an implementation of the XGBoost algorithm that achieves an average F1-score on a multi-label dataset with 11 classes, built using real-world code snippets submitted for programming competitions on the Codeforces platform.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Software Engineering Research · Parallel Computing and Optimization Techniques