Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts
Arno Simons

TL;DR
This study evaluates GPT-5's ability to support interpretative citation context analysis through a detailed, case-based approach, highlighting the influence of prompt design on interpretative outcomes and vocabulary.
Contribution
It demonstrates how prompt scaffolding systematically affects GPT-5's interpretative hypotheses and vocabulary in citation analysis, emphasizing methodological considerations.
Findings
GPT-5 reliably classifies citation function as 'supplementary'
Prompt framing influences the diversity of interpretative hypotheses
GPT-5 detects textual hinges and interprets lineage and positioning
Abstract
This paper tests whether large language models (LLMs) can support interpretative citation context analysis (CCA) by scaling in thick, text-grounded readings of a single hard case rather than scaling up typological labels. It foregrounds prompt-sensitivity analysis as a methodological issue by varying prompt scaffolding and framing in a balanced 2x3 design. Using footnote 6 in Chubin and Moitra (1975) and Gilbert's (1977) reconstruction as a probe, I implement a two-stage GPT-5 pipeline: a citation-text-only surface classification and expectation pass, followed by cross-document interpretative reconstruction using the citing and cited full texts. Across 90 reconstructions, the model produces 450 distinct hypotheses. Close reading and inductive coding identify 21 recurring interpretative moves, and linear probability models estimate how prompt choices shift their frequencies and lexical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Neurobiology of Language and Bilingualism
