Exploring the applicability of Large Language Models to citation context analysis
Kai Nishikawa, Hitoshi Koshiba

TL;DR
This study evaluates ChatGPT's effectiveness in citation context analysis, finding it consistent with humans but limited in predictive accuracy, suggesting its use as a supplementary tool rather than a replacement.
Contribution
It demonstrates the potential and limitations of LLMs like ChatGPT for citation context analysis, providing insights for future methodological development.
Findings
LLMs match or outperform humans in annotation consistency.
LLMs have lower predictive performance compared to human annotators.
LLMs can serve as reference or supplementary annotators in citation analysis.
Abstract
Unlike traditional citation analysis -- which assumes that all citations in a paper are equivalent -- citation context analysis considers the contextual information of individual citations. However, citation context analysis requires creating large amounts of data through annotation, which hinders the widespread use of this methodology. This study explored the applicability of Large Language Models (LLMs) -- particularly ChatGPT -- to citation context analysis by comparing LLMs and human annotation results. The results show that the LLMs annotation is as good as or better than the human annotation in terms of consistency but poor in terms of predictive performance. Thus, having LLMs immediately replace human annotators in citation context analysis is inappropriate. However, the annotation results obtained by LLMs can be used as reference information when narrowing the annotation results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
