Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research
Haowen Xu, Xueping Li, Jose Tupayachi, Jianming (Jamie) Lian, Femi, Omitaomu

TL;DR
This paper presents an AI-driven approach using Sentence Transformers and RAG to automate and improve bibliometric analysis, enabling semantic and contextual search for better research characterization in urban science.
Contribution
It introduces a novel workflow combining transformers, RAG, and clustering techniques for enhanced bibliometric analysis of high-impact urban research articles.
Findings
Effective semantic search and topic ranking achieved
Generated insightful summary statistics on research scope and quality
Demonstrated improved characterization of urban science literature
Abstract
Bibliometric analysis is essential for understanding research trends, scope, and impact in urban science, especially in high-impact journals, such Nature Portfolios. However, traditional methods, relying on keyword searches and basic NLP techniques, often fail to uncover valuable insights not explicitly stated in article titles or keywords. These approaches are unable to perform semantic searches and contextual understanding, limiting their effectiveness in classifying topics and characterizing studies. In this paper, we address these limitations by leveraging Generative AI models, specifically transformers and Retrieval-Augmented Generation (RAG), to automate and enhance bibliometric analysis. We developed a technical workflow that integrates a vector database, Sentence Transformers, a Gaussian Mixture Model (GMM), Retrieval Agent, and Large Language Models (LLMs) to enable contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
