Note on Selection Bias in Observational Estimates of Algorithmic Progress

Parker Whitfill

arXiv:2508.11033·econ.GN·August 19, 2025

Note on Selection Bias in Observational Estimates of Algorithmic Progress

Parker Whitfill

PDF

TL;DR

This paper highlights a potential selection bias issue in estimating algorithmic progress from observational data, questioning the validity of conclusions about efficiency improvements over time.

Contribution

It identifies a methodological problem where endogenous compute choices may bias estimates of algorithmic progress based on observational loss data.

Findings

01

Potential bias in estimating algorithmic progress due to endogenous compute choices

02

Raises awareness of selection bias in observational studies of algorithmic efficiency

03

Calls for improved methods to accurately measure algorithmic improvements over time

Abstract

Ho et. al (2024) attempts to estimate the degree of algorithmic progress from language models. They collect observational data on language models' loss and compute over time, and argue that as time has passed, language models' algorithmic efficiency has been rising. That is, the loss achieved for fixed compute has been dropping over time. In this note, I raise one potential methodological problem with the estimation strategy. Intuitively, if part of algorithmic quality is latent, and compute choices are endogenous to algorithmic quality, then resulting estimates of algorithmic quality will be contaminated by selection bias.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.