Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing
Reilly Raab, Mike Parker, Dan Nally, Sadie Montgomery, Anastasia Bernat, Sai Munikoti, Sameera Horawalavithana

TL;DR
This paper introduces a framework for creating transparent, auditable, and human-in-the-loop LM-powered subroutines, demonstrated through a public comment processing application for environmental reviews.
Contribution
It proposes a static typing framework for LM subroutines with auditability and online improvement, and implements it in a real-world public comment processing system.
Findings
The framework enables transparent auditing of LM artifacts.
The CommentNEPA system effectively summarizes public comments.
Quantitative evaluation shows alignment with human-annotated data.
Abstract
The advent of language models (LMs) has the potential to dramatically accelerate tasks that may be cast to text-processing; however, real-world adoption is hindered by concerns regarding safety, explainability, and bias. How can we responsibly leverage LMs in a transparent, auditable manner -- minimizing risk and allowing human experts to focus on informed decision-making rather than data-processing or prompt engineering? In this work, we propose a framework for declaring statically typed, LM-powered subroutines (i.e., callable, function-like procedures) for use within conventional asynchronous code -- such that sparse feedback from human experts is used to improve the performance of each subroutine online (i.e., during use). In our implementation, all LM-produced artifacts (i.e., prompts, inputs, outputs, and data-dependencies) are recorded and exposed to audit on demand. We package…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
