API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
Zhen Guo, Adriana Meza Soria, Wei Sun, Yikang Shen, Rameswar Panda

TL;DR
API Pack is a large multi-programming language dataset that significantly improves API call generation in language models through fine-tuning, outperforming some proprietary models and enhancing cross-language API generalization.
Contribution
Introduces API Pack, a massive dataset for API call generation, and demonstrates its effectiveness in fine-tuning models to outperform GPT-3.5 and GPT-4 in API call generation tasks.
Findings
Fine-tuning on API Pack enables open-source models to outperform GPT-3.5 and GPT-4.
Multi-language fine-tuning improves API generation accuracy across languages.
Larger datasets enhance API generalization to new APIs.
Abstract
We introduce API Pack, a massive multi-programming language dataset containing over one million instruction-API calls for improving the API call generation capabilities of large language models. Our evaluation highlights three key findings: First, fine-tuning on API Pack enables open-source models to outperform GPT-3.5 and GPT-4 in generating code for entirely new API calls. We show this by fine-tuning CodeLlama-13B on 20,000 Python instances from API Pack. Second, fine-tuning on a large dataset in one language, combined with smaller datasets from others, improves API generation accuracy across multiple languages. Third, we confirm the benefits of larger datasets for API generalization, as increasing fine-tuning data to one million instances enhances generalization to new APIs. To support further research, we open-source the API Pack dataset, trained model, and code at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsService-Oriented Architecture and Web Services · Web Data Mining and Analysis · Advanced Software Engineering Methodologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Position-Wise Feed-Forward Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Dropout · Linear Layer · Linear Warmup With Cosine Annealing · Attention Dropout
