Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems

Tianyi Huang; Samuel Xu; Jason Tansong Dang; Samuel Yan; Kimberley Yin

arXiv:2604.17487·cs.CL·May 19, 2026

Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems

Tianyi Huang, Samuel Xu, Jason Tansong Dang, Samuel Yan, Kimberley Yin

PDF

TL;DR

This paper introduces compositional selective specificity (CSS), a post-generation method that calibrates claim-level specificity in agentic systems to better express uncertainty and improve response quality.

Contribution

It proposes CSS as a novel post-generation layer that decomposes answers into claims and calibrates their specificity to control overcommitment.

Findings

01

CSS improves the risk-utility trade-off in answer generation.

02

CSS raises overcommitment-aware utility from 0.846 to 0.913.

03

CSS achieves 0.938 specificity retention.

Abstract

Agentic systems often fail not by being entirely wrong, but by being too precise: a response may be generally useful while particular claims exceed what the evidence supports. We study this failure mode as overcommitment control and introduce compositional selective specificity (CSS), a post-generation layer that decomposes an answer into claims, proposes coarser backoffs, and emits each claim at the most specific calibrated level that appears admissible. The method is designed to express uncertainty as a local semantic backoff rather than as a whole-answer refusal. Across a full LongFact run and HotpotQA pilots, calibrated CSS improves the risk-utility trade-off of fixed drafts. On the full LongFact run, it raises overcommitment-aware utility from 0.846 to 0.913 relative to the no-CSS output while achieving 0.938 specificity retention. These results suggest that claim-level specificity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.