Can AI Freelancers Compete? Benchmarking Earnings, Reliability, and Task Success at Scale

David Noever; Forrest McKee

arXiv:2505.13511·cs.AI·May 21, 2025

Can AI Freelancers Compete? Benchmarking Earnings, Reliability, and Task Success at Scale

David Noever, Forrest McKee

PDF

Open Access

TL;DR

This paper introduces a scalable benchmark for evaluating large language models as autonomous freelance programmers, measuring their task success and earnings on synthetic economic data tasks.

Contribution

It presents a novel, automated benchmarking framework for assessing LLMs on freelance programming tasks with monetary valuation, enabling scalable performance analysis.

Findings

01

Claude 3.5 Haiku earns approximately $1.52 million

02

GPT-4o-mini earns approximately $1.49 million

03

Models rarely fail completely on tasks

Abstract

This study explores Large Language Models (LLMs) as autonomous agents for real-world tasks, including freelance software development. This work presents a new benchmark that evaluates LLMs on freelance programming and data analysis tasks derived from economic data. We construct the benchmark using synthetic tasks created from a Kaggle Freelancer dataset of job postings, with all job prices standardized to USD (median fixed-project price around $250, an d ana v er a g eo f$ 306). Each task is accompanied by structured input-output test cases and an estimated price tag, enabling automated correctness checking and a monetary performance valuation. This approach is inspired by OpenAI's recent SWE-Lancer benchmark (1,400 real Upwork tasks worth $1M total). Still, our framework simplifies evaluation using programmatically testable tasks and predicted price values, making it highly scalable and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Economy and Work Transformation · Retirement, Disability, and Employment · AI and HR Technologies