Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Alimurtaza Mustafa Merchant; Krish Veera; Sajal Kumar Goyla; Shambhawi Bhure; Dhaval Patel; Kaoutar El Maghraoui

arXiv:2605.20630·cs.AI·May 21, 2026

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Alimurtaza Mustafa Merchant, Krish Veera, Sajal Kumar Goyla, Shambhawi Bhure, Dhaval Patel, Kaoutar El Maghraoui

PDF

TL;DR

This paper evaluates caching and workflow optimization techniques in industrial agent pipelines, demonstrating significant speedups and identifying limitations of semantic caching for parameter-rich queries.

Contribution

It introduces a temporal semantic cache and MCP workflow optimizations, improving efficiency and analyzing caching limitations in industrial workflows.

Findings

01

Semantic caching can achieve up to 30.6x speedup on cache hits.

02

Workflow optimizations reduced median latency by 40%.

03

Pure semantic caching can fail on parameter-rich industrial queries.

Abstract

Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.