Methods, Data, and Conceptual Change: Reflections from Two Quantitative Diachronic Case Studies
Catherine Wong, Bach Phan-Tat, Susan Fitzmaurice

TL;DR
This paper examines how different quantitative methods in historical linguistics interact with dataset properties, revealing their limitations and the influence of data structure on detecting semantic change.
Contribution
It provides a comparative analysis of two quantitative approaches, highlighting how dataset structure influences semantic change detection in historical linguistics.
Findings
Frequency-based methods have limited scope in detecting semantic change.
Dataset structure significantly shapes the interpretability of diachronic linguistic data.
Comparative reflection clarifies methodological limits in historical linguistics.
Abstract
This discussion paper reflects on how quantitative approaches to historical linguistics interact with dataset properties. Drawing on two worked examples, we examine English data using quad-based concept modelling of Early Modern English discourse in EEBO-TCP (c. 1470s-1690s; 765M words) alongside SynFlow analysis of scientific writing in Royal Society Corpus 6.0.4 (1750-1799; drawn from a 78.6M-token open corpus). Through parallel comparison, the paper explores how each approach operationalises concepts, the data assumptions they entail, and the diachronic interpretations they support. We argue that comparative methodological reflection clarifies the limits of purely lexical, frequency-based approaches and highlights how dataset structure shapes the kinds of semantic change that quantitative methods can reliably detect.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
