Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Xiangru Tang; Tianrui Qin; Tianhao Peng; Ziyang Zhou; Daniel Shao; Tingting Du; Xinming Wei; Peng Xia; Fang Wu; He Zhu; Ge Zhang; Jiaheng Liu; Xingyao Wang; Sirui Hong; Chenglin Wu; Hao Cheng; Chi Wang; Wangchunshu Zhou

arXiv:2507.06229·cs.CL·October 28, 2025

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Xiangru Tang, Tianrui Qin, Tianhao Peng, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, Ge Zhang, Jiaheng Liu, Xingyao Wang, Sirui Hong, Chenglin Wu, Hao Cheng, Chi Wang, Wangchunshu Zhou

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces AGENT KB, a universal memory system that enables cross-domain experience sharing among heterogeneous AI agents, significantly improving their problem-solving capabilities across various benchmarks.

Contribution

It presents AGENT KB, a novel shared memory infrastructure that facilitates cross-architecture knowledge transfer without retraining, enhancing agent performance across multiple frameworks.

Findings

01

Up to 18.7 percentage point improvement in pass@3 for smolagents.

02

4.0 percentage point improvement on SWE-bench pass@1 for OpenHands.

03

Hybrid retrieval and feedback are crucial for performance gains.

Abstract

AI agent frameworks operate in isolation, forcing agents to rediscover solutions and repeat mistakes across different systems. Despite valuable problem-solving experiences accumulated by frameworks like smolagents, OpenHands, and OWL, this knowledge remains trapped within individual systems, preventing the emergence of collective intelligence. Current memory systems focus on individual agents or framework-specific demonstrations, failing to enable cross-architecture knowledge transfer. We introduce AGENT KB, a universal memory infrastructure enabling seamless experience sharing across heterogeneous agent frameworks without retraining. AGENT KB aggregates trajectories into a structured knowledge base and serves lightweight APIs. At inference time, hybrid retrieval operates through two stages: planning seeds agents with cross-domain workflows, while feedback applies targeted diagnostic…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 3

Strengths

The authors built a memory system that demonstrates performance improvements on evals across different model families, frameworks, and eval types. Using cross-framework memory store is new and builds on prior memory stores that focus more on a specific framework. By collecting trajectories across multiple frameworks, the overall system can benefit from the diversity coming from different frameworks. Separating planning from feedback is also relatively new. Evaluation across a number of domains i

Weaknesses

This paper could be substantially stronger if the improvements from Agent KB style memory were compared to gains from other memory systems (e.g., other classic RAG systems / embedding database, scratchpad, or other API-based memory store). This would help establish novelty compared to other similar systems, and make clearer why this approach is superior and worth continuing to push on. The authors evaluate on SWE-Bench from 2023, which is known to have ~50% broken problems and is not the communi

Reviewer 02Rating 4Confidence 3

Strengths

1. The motivation—enabling collective intelligence across different agent frameworks—is timely and well-justified. 2. The paper shows consistent and substantial gains across diverse benchmarks, agent types, and model backbones, with convincing ablation studies to support claims.

Weaknesses

1. The framework-agnostic experience representation appears to be the key innovation, yet its implementation details and technical challenges are not clearly explained (after reading the appendices). 2. The disagreement gate for refinement rejection, another key contribution, seems heuristic-driven; it may wrongly reject beneficial refinements (that might make the embedding similarity low) if the initial plan is flawed. 3. While the experiments are extensive, the paper’s readability and structur

Reviewer 03Rating 4Confidence 3

Strengths

The research question is well motivated, aiming to target 3 challenges: representation heterogeneity, context mismatch, and knowledge interference. This is an impactful problem. Strong quantitative empirical results, with consistent improvements compared to prior memory-based systems of A-MEM.

Weaknesses

1. The ability to abstract and distill the "heterogeneous agent trajectories into structured experience units" is a key concept that the pipeline relies on, and it is implemented by "few-shot prompting (10-15 human-curated exemplars per domain)" and "standardized action vocabularies". This is somewhat brittle to claim the method is seamless if left unjustified. The reliance on "standardized action vocabularies" seems to hide a great deal of complexity. Does this mean the system can only integrat

Reviewer 04Rating 2Confidence 4

Strengths

1. This paper touches an important and timely research problem, about how to conduct agent memory effectively for complicated agentic tasts. 2. The peformance reporting on GAIA and SWE-bench is promising and encouraging. 3.The system seems plug-and-play, which can be used by different scalffold and show promising results.

Weaknesses

1. The novelty is limited and very enginerring-heavy: - Reason-Retrieve-Refine is largely borrowed from case-based reasoning literature - hybrid retrieval (BM25 + semantic) is standard practice and Few-shot experience generation uses straightforward prompting - They disagreemnt threshold is based on embedding which is very common in many RAG works, and it's not clear to illustrate how to set this threshold, which may tuned based on target data. 2. The experiments are weak: - There are no compar

Code & Models

Repositories

oppo-personalai/agent-kb
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Reinforcement Learning in Robotics

MethodsDropout · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer · Balanced Selection · GPT-4