Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning

Abhinaba Basu

arXiv:2602.18922·cs.CL·March 24, 2026

Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning

Abhinaba Basu

PDF

Open Access 1 Datasets

TL;DR

This paper identifies why existing agent caching methods fail due to improper focus on classification accuracy and proposes a structured intent canonicalization framework using few-shot learning, significantly improving cache effectiveness and reducing costs.

Contribution

The paper introduces W5H2, a structured intent decomposition framework, and demonstrates its effectiveness with SetFit, achieving high accuracy and cost reduction across multilingual benchmarks.

Findings

01

GPTCache achieves 37.9% accuracy; APC achieves 0-12%.

02

W5H2 with SetFit reaches 91.1% accuracy on MASSIVE.

03

The cascade approach handles 85% interactions locally, reducing costs by 97.5%.

Abstract

Personal AI agents incur substantial cost via repeated LLM calls. We show existing caching methods fail: GPTCache achieves 37.9% accuracy on real benchmarks; APC achieves 0-12%. The root cause is optimizing for the wrong property -- cache effectiveness requires key consistency and precision, not classification accuracy. We observe cache-key evaluation reduces to clustering evaluation and apply V-measure decomposition to separate these on n=8,682 points across MASSIVE, BANKING77, CLINC150, and NyayaBench v2, our new 8,514-entry multilingual agentic dataset (528 intents, 20 W5H2 classes, 63 languages). We introduce W5H2, a structured intent decomposition framework. Using SetFit with 8 examples per class, W5H2 achieves 91.1%+/-1.7% on MASSIVE in ~2ms -- vs 37.9% for GPTCache and 68.8% for a 20B-parameter LLM at 3,447ms. On NyayaBench v2 (20 classes), SetFit achieves 55.3%, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

biztiger/nyayabench-v2
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Caching and Content Delivery