1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?

Robin-Nico Kampa; Fabian Deuser; Anna B\"o{\ss}end\"orfer; Konrad Habel; Norbert Oswald

arXiv:2605.17046·cs.LG·May 20, 2026

1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?

Robin-Nico Kampa, Fabian Deuser, Anna B\"o{\ss}end\"orfer, Konrad Habel, Norbert Oswald

PDF

1 Repo

TL;DR

The paper introduces 1GC-7RC, a comprehensive benchmark for evaluating AI coding agents across seven diverse ML tasks on a single GPU within specified time limits.

Contribution

It presents a standardized, modular benchmark with evaluation scripts and baseline training code, enabling fair comparison of autonomous AI coding agents.

Findings

01

Substantial performance differences among seven evaluated agents.

02

The benchmark reveals varying levels of ML knowledge and planning ability.

03

All evaluation artifacts are publicly available for reproducibility.

Abstract

Autonomous AI coding agents are becoming a core tool for ML practitioners in industry and research alike. Despite this growing adoption, no standardized benchmark exists to evaluate their ability to design, implement, and train models from scratch across diverse domains. We introduce **1GC-7RC** (*Single Graphic Card: Seven Research Challenges*), a benchmark comprising seven ML tasks spanning language modeling, image classification, semantic segmentation, graph learning, tabular prediction, time-series forecasting, and text classification. Each task provides a locked data-preparation and evaluation script together with a baseline training script; the agent may only modify the training code, has no access to pretrained weights (with one controlled exception for semantic segmentation), no internet access, and must complete each task within a task-specific wall-clock budget (40-120…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Strolchii/1GC-7RC-Benchmark
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.