Remote Labor Index: Measuring AI Automation of Remote Work

Mantas Mazeika; Alice Gatti; Cristina Menghini; Udari Madhushani Sehwag; Shivam Singhal; Yury Orlovskiy; Steven Basart; Manasi Sharma; Denis Peskoff; Elaine Lau; Jaehyuk Lim; Lachlan Carroll; Alice Blair; Vinaya Sivakumar; Sumana Basu; Brad Kenstler; Yuntao Ma; Julian Michael; Xiaoke Li; Oliver Ingebretsen; Aditya Mehta; Jean Mottola; John Teichmann; Kevin Yu; Zaina Shaik; Adam Khoja; Richard Ren; Jason Hausenloy; Long Phan; Ye Htet; Ankit Aich; Tahseen Rabbani; Vivswan Shah; Andriy Novykov; Felix Binder; Kirill Chugunov; Luis Ramirez; Matias Geralnik; Hern\'an Mesura; Dean Lee; Ed-Yeremai Hernandez Cardona; Annette Diamond; Summer Yue; Alexandr Wang; Bing Liu; Ernesto Hernandez; and Dan Hendrycks

arXiv:2510.26787·cs.LG·October 31, 2025

Remote Labor Index: Measuring AI Automation of Remote Work

Mantas Mazeika, Alice Gatti, Cristina Menghini, Udari Madhushani Sehwag, Shivam Singhal, Yury Orlovskiy, Steven Basart, Manasi Sharma, Denis Peskoff, Elaine Lau, Jaehyuk Lim, Lachlan Carroll, Alice Blair, Vinaya Sivakumar, Sumana Basu, Brad Kenstler, Yuntao Ma, Julian Michael

PDF

1 Datasets

TL;DR

The paper introduces the Remote Labor Index (RLI), a benchmark to evaluate AI's practical automation capabilities across sectors, revealing current AI agents perform poorly, with only 2.5% automation rate, thus informing discussions on AI's economic impact.

Contribution

It presents the RLI as a new multi-sector benchmark for assessing AI automation in real-world remote work scenarios, bridging research progress with economic implications.

Findings

01

AI agents perform near the floor on RLI

02

Highest-performing agent achieves 2.5% automation rate

03

RLI provides empirical basis for AI automation discussions

Abstract

AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

cais/rli-public-set
dataset· 309 dl
309 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.