Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3

Natapong Nitarach

arXiv:2603.27844·cs.CL·April 17, 2026

Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3

Natapong Nitarach

PDF

1 Repo

TL;DR

This paper shows that model capability is the main factor in inference-time optimization for mathematical reasoning, with limited gains from prompt engineering and potential improvements from model selection strategies.

Contribution

It demonstrates that model capability dominates over prompt strategies in inference-time optimization, highlighting the importance of model selection over prompt engineering.

Findings

01

High-temperature sampling reduces error correlation.

02

Model capability gap explains majority voting performance.

03

Prompt engineering cannot significantly improve results.

Abstract

Majority voting over multiple LLM attempts improves mathematical reasoning, but correlated errors limit the effective sample size. A natural fix is to assign different reasoning strategies to different voters. The approach, Diverse Prompt Mixer, is tested on the AIMO 3 competition: 3 models, 23+ experiments, 50 IMO-level problems, one H100 80 GB, 5-hour limit. Every prompt-level intervention fails. High-temperature sampling already decorrelates errors; weaker strategies reduce accuracy more than they reduce correlation. Across an 8-point capability gap at equal N=8 and every optimization tested, model capability dominates. The gap between the best majority-vote score (42/50) and pass@20 (~45.5) is selection loss, not prompt loss. A verifier-based selector could close it. Prompt engineering cannot.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nat-nischw/model-capability-dominates-lessons-aimo3
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.