Aletheia tackles FirstProof autonomously

Tony Feng; Junehyuk Jung; Sang-hyun Kim; Carlo Pagano; Sergei Gukov; Chiang-Chiang Tsai; David Woodruff; Adel Javanmard; Aryan Mokhtari; Dawsen Hwang; Yuri Chervonyi; Jonathan N. Lee; Garrett Bingham; Trieu H. Trinh; Vahab Mirrokni; Quoc V. Le; Thang Luong

arXiv:2602.21201·cs.AI·March 17, 2026

Aletheia tackles FirstProof autonomously

Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Sergei Gukov, Chiang-Chiang Tsai, David Woodruff, Adel Javanmard, Aryan Mokhtari, Dawsen Hwang, Yuri Chervonyi, Jonathan N. Lee, Garrett Bingham, Trieu H. Trinh, Vahab Mirrokni, Quoc V. Le, Thang Luong

PDF

Open Access

TL;DR

Aletheia, a mathematics research agent powered by Gemini 3 Deep Think, autonomously solved 6 out of 10 problems in the FirstProof challenge, demonstrating its capability in automated mathematical problem solving.

Contribution

This paper presents the first autonomous performance of Aletheia on the FirstProof challenge, showcasing its ability to solve complex mathematical problems without human intervention.

Findings

01

Aletheia solved 6 out of 10 problems autonomously.

02

Experts were not unanimous on Problem 8.

03

Full transparency and evaluation details are provided.

Abstract

We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Mathematics, Computing, and Information Processing · Artificial Intelligence in Games