A Hybrid Framework for Subject Analysis: Integrating Embedding-Based Regression Models with Large Language Models
Jinyu Liu, Xiaoying Song, Diana Zhang, Jason Thomale, Daqing He, Lingzi Hong

TL;DR
This paper introduces a hybrid framework combining embedding-based machine learning models with large language models to improve subject analysis in library systems, reducing hallucinations and aligning outputs with controlled vocabularies.
Contribution
It presents a novel hybrid approach that guides LLM predictions using ML models and post-edits to enhance accuracy and control in subject classification tasks.
Findings
Hybrid framework improves subject prediction accuracy.
Guided LLMs produce more controlled and vocabulary-aligned outputs.
Post-editing reduces hallucinations in LLM-generated subject terms.
Abstract
Providing subject access to information resources is an essential function of any library management system. Large language models (LLMs) have been widely used in classification and summarization tasks, but their capability to perform subject analysis is underexplored. Multi-label classification with traditional machine learning (ML) models has been used for subject analysis but struggles with unseen cases. LLMs offer an alternative but often over-generate and hallucinate. Therefore, we propose a hybrid framework that integrates embedding-based ML models with LLMs. This approach uses ML models to (1) predict the optimal number of LCSH labels to guide LLM predictions and (2) post-edit the predicted terms with actual LCSH terms to mitigate hallucinations. We experimented with LLMs and the hybrid framework to predict the subject terms of books using the Library of Congress Subject Headings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
