A Skill-Based AI Agentic Pipeline for Library of Congress Subject Indexing

Eric H. C. Chow

arXiv:2605.03537·cs.DL·May 6, 2026

A Skill-Based AI Agentic Pipeline for Library of Congress Subject Indexing

Eric H. C. Chow

PDF

TL;DR

This paper introduces a modular AI pipeline that automates Library of Congress subject indexing by decomposing the process into four skill-based steps, aligning closely with professional practices.

Contribution

The paper presents a novel, domain-informed AI agentic pipeline for subject indexing, integrating manual guidelines into automated skills for improved accuracy.

Findings

01

Strong alignment with professional indexing practices

02

Effective decomposition into four sequential agent skills

03

Notable differences in specificity and subdivision practices

Abstract

This paper presents a modular AI agentic skill pipeline for automating subject indexing with Library of Congress Subject Headings (LCSH). Subject indexing - the process of analyzing a work's aboutness, selecting controlled vocabulary terms, and encoding them as MARC21 subject access fields - is one of the most time-consuming components of library cataloging. The system decomposes this process into four discrete, sequentially executed agent skills: conceptual analysis, quantitative filtering, authority validation, and MARC field synthesis. Each skill encodes domain knowledge drawn directly from Library of Congress Subject Headings Manual (SHM) instruction sheets and subject analysis theory. The pipeline was evaluated against a corpus of ten titles whose existing subject headings were captured from the Harvard Library bibliographic dataset (a snapshot of their Alma ILS). Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.