SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation
Ivan Petrukha, Yana Kurliak, Nataliia Stulova

TL;DR
SwiftEval introduces a dedicated Swift programming language benchmark to accurately evaluate LLM-generated code, addressing limitations of existing Python-centric benchmarks and revealing performance drops in language-specific tasks.
Contribution
We created the first Swift-specific benchmark with 28 hand-crafted problems and evaluated 44 LLMs, highlighting the need for language-specific evaluation tools.
Findings
LLMs perform worse on Swift-specific tasks
Smaller models show more significant performance drops
Existing benchmarks are inadequate for Swift evaluation
Abstract
In recent years, large language models (LLMs) have showcased significant advancements in code generation. However, most evaluation benchmarks are primarily oriented towards Python, making it difficult to evaluate other programming languages, such as Swift, with high quality. By examining widely established multilingual benchmarks like HumanEval-XL and MultiPL-E, we identified critical issues specific to their Swift components, making them insufficient or even irrelevant for assessing LLM coding capabilities on Swift. Unlike these existing approaches, which prioritize rapid scaling and generalization by automatically translating Python-centric benchmarks with LLMs, we adopt a quality-over-quantity methodology. We present SwiftEval, the first Swift-oriented benchmark consisting of 28 carefully hand-crafted problems, and evaluate 44 popular Code LLMs on it. Our results show significant LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing
MethodsADaptive gradient method with the OPTimal convergence rate
