SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation

Ivan Petrukha; Yana Kurliak; Nataliia Stulova

arXiv:2505.24324·cs.LG·June 2, 2025

SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation

Ivan Petrukha, Yana Kurliak, Nataliia Stulova

PDF

Open Access

TL;DR

SwiftEval introduces a dedicated Swift programming language benchmark to accurately evaluate LLM-generated code, addressing limitations of existing Python-centric benchmarks and revealing performance drops in language-specific tasks.

Contribution

We created the first Swift-specific benchmark with 28 hand-crafted problems and evaluated 44 LLMs, highlighting the need for language-specific evaluation tools.

Findings

01

LLMs perform worse on Swift-specific tasks

02

Smaller models show more significant performance drops

03

Existing benchmarks are inadequate for Swift evaluation

Abstract

In recent years, large language models (LLMs) have showcased significant advancements in code generation. However, most evaluation benchmarks are primarily oriented towards Python, making it difficult to evaluate other programming languages, such as Swift, with high quality. By examining widely established multilingual benchmarks like HumanEval-XL and MultiPL-E, we identified critical issues specific to their Swift components, making them insufficient or even irrelevant for assessing LLM coding capabilities on Swift. Unlike these existing approaches, which prioritize rapid scaling and generalization by automatically translating Python-centric benchmarks with LLMs, we adopt a quality-over-quantity methodology. We present SwiftEval, the first Swift-oriented benchmark consisting of 28 carefully hand-crafted problems, and evaluate 44 popular Code LLMs on it. Our results show significant LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing

MethodsADaptive gradient method with the OPTimal convergence rate