AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge

Karen Zhou; Chenhao Tan

arXiv:2603.07019·cs.CL·March 10, 2026

AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge

Karen Zhou, Chenhao Tan

PDF

Open Access

TL;DR

AutoChecklist introduces a modular, open-source framework for generating and scoring checklists with LLMs, enhancing interpretability and alignment in evaluation, with validated effectiveness and domain adaptability.

Contribution

The paper presents AutoChecklist, a novel library that unifies checklist-based evaluation into composable pipelines with a flexible taxonomy and supports multiple LLM providers.

Findings

01

Checklist methods align well with human preferences.

02

AutoChecklist's pipelines improve evaluation consistency.

03

The framework supports domain-specific adaptation.

Abstract

Checklists have emerged as a popular approach for interpretable and fine-grained evaluation, particularly with LLM-as-a-Judge. Beyond evaluation, these structured criteria can serve as signals for model alignment, reinforcement learning, and self-correction. To support these use cases, we present AutoChecklist, an open-source library that unifies checklist-based evaluation into composable pipelines. At its core is a taxonomy of five checklist generation abstractions, each encoding a distinct strategy for deriving evaluation criteria. A modular Generator $\to$ Refiner $\to$ Scorer pipeline connects any generator with a unified scorer, and new configurations can be registered via prompt templates alone. The library ships with ten built-in pipelines implementing published approaches and supports multiple LLM providers (OpenAI, OpenRouter, vLLM). Beyond the Python API, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques