Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing

Reilly Raab; Mike Parker; Dan Nally; Sadie Montgomery; Anastasia Bernat; Sai Munikoti; Sameera Horawalavithana

arXiv:2507.08109·cs.CL·July 14, 2025

Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing

Reilly Raab, Mike Parker, Dan Nally, Sadie Montgomery, Anastasia Bernat, Sai Munikoti, Sameera Horawalavithana

PDF

TL;DR

This paper introduces a framework for creating transparent, auditable, and human-in-the-loop LM-powered subroutines, demonstrated through a public comment processing application for environmental reviews.

Contribution

It proposes a static typing framework for LM subroutines with auditability and online improvement, and implements it in a real-world public comment processing system.

Findings

01

The framework enables transparent auditing of LM artifacts.

02

The CommentNEPA system effectively summarizes public comments.

03

Quantitative evaluation shows alignment with human-annotated data.

Abstract

The advent of language models (LMs) has the potential to dramatically accelerate tasks that may be cast to text-processing; however, real-world adoption is hindered by concerns regarding safety, explainability, and bias. How can we responsibly leverage LMs in a transparent, auditable manner -- minimizing risk and allowing human experts to focus on informed decision-making rather than data-processing or prompt engineering? In this work, we propose a framework for declaring statically typed, LM-powered subroutines (i.e., callable, function-like procedures) for use within conventional asynchronous code -- such that sparse feedback from human experts is used to improve the performance of each subroutine online (i.e., during use). In our implementation, all LM-produced artifacts (i.e., prompts, inputs, outputs, and data-dependencies) are recorded and exposed to audit on demand. We package…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.