Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation
Robin D. Pesl, Jerin G. Mathew, Massimo Mecella, Marco Aiello

TL;DR
This paper explores how to preprocess OpenAPI descriptions for better retrieval-augmented generation, introducing a discovery agent to improve endpoint retrieval efficiency and accuracy in system integration tasks.
Contribution
It proposes a novel preprocessing method and a discovery agent that enhance API endpoint retrieval using RAG, reducing token usage and improving retrieval metrics.
Findings
LLM-based chunking outperforms naive methods
Discovery agent improves retrieval recall and precision
Effective API description preprocessing reduces token count
Abstract
Integrating multiple (sub-)systems is essential to create advanced Information Systems (ISs). Difficulties mainly arise when integrating dynamic environments across the IS lifecycle. A traditional approach is a registry that provides the API documentation of the systems' endpoints. Large Language Models (LLMs) have shown to be capable of automatically creating system integrations (e.g., as service composition) based on this documentation but require concise input due to input token limitations, especially regarding comprehensive API descriptions. Currently, it is unknown how best to preprocess these API descriptions. Within this work, we (i) analyze the usage of Retrieval Augmented Generation (RAG) for endpoint discovery and the chunking, i.e., preprocessing, of OpenAPIs to reduce the input token length while preserving the most relevant information. To further reduce the input token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · travel james · Attention Is All You Need · Weight Decay · Linear Warmup With Linear Decay · Linear Layer · Layer Normalization · WordPiece · Attention Dropout · Multi-Head Attention
