SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
Samuel Miserendino, Michele Wang, Tejal Patwardhan, Johannes Heidecke

TL;DR
SWE-Lancer is a comprehensive benchmark of over 1,400 real-world freelance software engineering tasks valued at $1 million, designed to evaluate the capabilities and economic potential of frontier language models in practical software engineering scenarios.
Contribution
This paper introduces SWE-Lancer, the first large-scale, real-world software engineering benchmark with monetary valuation, including both technical and managerial tasks, and provides an open-source evaluation platform.
Findings
Frontier models struggle to solve most tasks.
Benchmark covers diverse engineering and managerial tasks.
Open-source tools facilitate future AI research in software engineering.
Abstract
We introduce SWE-Lancer, a benchmark of over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. SWE-Lancer encompasses both independent engineering tasks--ranging from $50 bug fixes to $32,000 feature implementations--and managerial tasks, where models choose between technical implementation proposals. Independent tasks are graded with end-to-end tests triple-verified by experienced software engineers, while managerial decisions are assessed against the choices of the original hired engineering managers. We evaluate model performance and find that frontier models are still unable to solve the majority of tasks. To facilitate future research, we open-source a unified Docker image and a public evaluation split, SWE-Lancer Diamond (https://github.com/openai/SWELancer-Benchmark). By mapping model performance to monetary value, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinTech, Crowdfunding, Digital Finance
