Kotlin ML Pack: Technical Report
Sergey Titov, Mikhail Evtikhiev, Anton Shapkin, Oleg Smirnov, Sergei, Boytsov, Sergei Boytsov, Dariia Karaeva, Maksim Sheptyakov, Mikhail Arkhipov,, Timofey Bryksin, Egor Bogomolov

TL;DR
This paper introduces new Kotlin datasets and demonstrates that high-quality, small datasets can significantly enhance code generation model performance, with notable improvements on the HumanEval benchmark.
Contribution
It presents three novel Kotlin datasets, fine-tunes models on this data, and rewrites the HumanEval benchmark in Kotlin, showing substantial performance gains.
Findings
Up to 16-point increase in HumanEval pass rate
High-quality small datasets improve model performance
Rewritten Kotlin HumanEval benchmark for evaluation
Abstract
In this technical report, we present three novel datasets of Kotlin code: KStack, KStack-clean, and KExercises. We also describe the results of fine-tuning CodeLlama and DeepSeek models on this data. Additionally, we present a version of the HumanEval benchmark rewritten by human experts into Kotlin - both the solutions and the tests. Our results demonstrate that small, high-quality datasets (KStack-clean and KExercises) can significantly improve model performance on code generation tasks, achieving up to a 16-point increase in pass rate on the HumanEval benchmark. Lastly, we discuss potential future work in the field of improving language modeling for Kotlin, including the use of static analysis tools in the learning process and the introduction of more intricate and realistic benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Algorithms and Applications
