So Much in So Little: Creating Lightweight Embeddings of Python Libraries
Yaroslav Golubev, Egor Bogomolov, Egor Bulychev, Timofey Bryksin

TL;DR
This paper develops lightweight 32-dimensional embeddings of Python libraries based on project dependencies, demonstrating their effectiveness in library recommendation and providing a benchmark and user insights.
Contribution
It introduces a novel embedding method for Python libraries using SVD on co-occurrence data and applies it to improve library recommendations.
Findings
Embeddings outperform popularity-based baselines.
The approach captures semantic relations among libraries.
User study shows domain-specific variation in suggestion quality.
Abstract
In software engineering, different approaches and machine learning models leverage different types of data: source code, textual information, historical data. An important part of any project is its dependencies. The list of dependencies is relatively small but carries a lot of semantics with it, which can be used to compare projects or make judgements about them. In this paper, we focus on Python projects and their PyPi dependencies in the form of requirements.txt files. We compile a dataset of 7,132 Python projects and their dependencies, as well as use Git to pull their versions from previous years. Using this data, we build 32-dimensional embeddings of libraries by applying Singular Value Decomposition to the co-occurrence matrix of projects and libraries. We then cluster the embeddings and study their semantic relations. To showcase the usefulness of such lightweight library…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Computational Physics and Python Applications
