All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs
Vitaliy Bibaev, Alexey Kalina, Vadim Lomshakov, Yaroslav Golubev,, Alexander Bezzubov, Nikita Povarov, Timofey Bryksin

TL;DR
This paper presents a privacy-compliant method for improving code completion in IDEs by learning from anonymized user logs, resulting in more efficient suggestions and fewer keystrokes needed.
Contribution
It introduces a novel approach to collect and utilize anonymized IDE usage logs for training a machine learning ranking model to enhance code completion.
Findings
Significant reduction in typing actions for code completion
Effective ranking model trained on user logs improves suggestions
Method complies with privacy and legal standards
Abstract
In this work, we propose an approach for collecting completion usage logs from the users in an IDE and using them to train a machine learning based model for ranking completion candidates. We developed a set of features that describe completion candidates and their context, and deployed their anonymized collection in the Early Access Program of IntelliJ-based IDEs. We used the logs to collect a dataset of code completions from users, and employed it to train a ranking CatBoost model. Then, we evaluated it in two settings: on a held-out set of the collected completions and in a separate A/B test on two different groups of users in the IDE. Our evaluation shows that using a simple ranking model trained on the past user behavior logs significantly improved code completion experience. Compared to the default heuristics-based ranking, our model demonstrated a decrease in the number of typing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Mobile Crowdsensing and Crowdsourcing
