Large language models require a new form of oversight: capability-based monitoring

Katherine C. Kellogg; Bingyang Ye; Yifan Hu; Guergana K. Savova; Byron Wallace; Danielle S. Bitterman

arXiv:2511.03106·cs.AI·November 6, 2025

Large language models require a new form of oversight: capability-based monitoring

Katherine C. Kellogg, Bingyang Ye, Yifan Hu, Guergana K. Savova, Byron Wallace, Danielle S. Bitterman

PDF

Open Access

TL;DR

This paper introduces a new capability-based monitoring framework for large language models in healthcare, focusing on shared capabilities rather than task-specific performance to improve oversight and safety.

Contribution

It proposes a scalable, capability-centered approach for monitoring LLMs, addressing limitations of traditional task-based oversight in dynamic healthcare environments.

Findings

01

Capability-based monitoring enables detection of systemic weaknesses.

02

It allows cross-task identification of long-tail errors.

03

Supports safe, adaptive oversight of generalist AI models.

Abstract

The rapid adoption of large language models (LLMs) in healthcare has been accompanied by scrutiny of their oversight. Existing monitoring approaches, inherited from traditional machine learning (ML), are task-based and founded on assumed performance degradation arising from dataset drift. In contrast, with LLMs, inevitable model degradation due to changes in populations compared to the training dataset cannot be assumed, because LLMs were not trained for any specific task in any given population. We therefore propose a new organizing principle guiding generalist LLM monitoring that is scalable and grounded in how these models are developed and used in practice: capability-based monitoring. Capability-based monitoring is motivated by the fact that LLMs are generalist systems whose overlapping internal capabilities are reused across numerous downstream tasks. Instead of evaluating each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare