A Skill-Based AI Agentic Pipeline for Library of Congress Subject Indexing
Eric H. C. Chow

TL;DR
This paper introduces a modular AI pipeline that automates Library of Congress subject indexing by decomposing the process into four skill-based steps, aligning closely with professional practices.
Contribution
The paper presents a novel, domain-informed AI agentic pipeline for subject indexing, integrating manual guidelines into automated skills for improved accuracy.
Findings
Strong alignment with professional indexing practices
Effective decomposition into four sequential agent skills
Notable differences in specificity and subdivision practices
Abstract
This paper presents a modular AI agentic skill pipeline for automating subject indexing with Library of Congress Subject Headings (LCSH). Subject indexing - the process of analyzing a work's aboutness, selecting controlled vocabulary terms, and encoding them as MARC21 subject access fields - is one of the most time-consuming components of library cataloging. The system decomposes this process into four discrete, sequentially executed agent skills: conceptual analysis, quantitative filtering, authority validation, and MARC field synthesis. Each skill encodes domain knowledge drawn directly from Library of Congress Subject Headings Manual (SHM) instruction sheets and subject analysis theory. The pipeline was evaluated against a corpus of ten titles whose existing subject headings were captured from the Harvard Library bibliographic dataset (a snapshot of their Alma ILS). Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
