Note on Selection Bias in Observational Estimates of Algorithmic Progress
Parker Whitfill

TL;DR
This paper highlights a potential selection bias issue in estimating algorithmic progress from observational data, questioning the validity of conclusions about efficiency improvements over time.
Contribution
It identifies a methodological problem where endogenous compute choices may bias estimates of algorithmic progress based on observational loss data.
Findings
Potential bias in estimating algorithmic progress due to endogenous compute choices
Raises awareness of selection bias in observational studies of algorithmic efficiency
Calls for improved methods to accurately measure algorithmic improvements over time
Abstract
Ho et. al (2024) attempts to estimate the degree of algorithmic progress from language models. They collect observational data on language models' loss and compute over time, and argue that as time has passed, language models' algorithmic efficiency has been rising. That is, the loss achieved for fixed compute has been dropping over time. In this note, I raise one potential methodological problem with the estimation strategy. Intuitively, if part of algorithmic quality is latent, and compute choices are endogenous to algorithmic quality, then resulting estimates of algorithmic quality will be contaminated by selection bias.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
