Knowledge-Graph-Driven Data Synthesis for Low-Resource Software Development: A HarmonyOS Case Study
Mingwei Liu, Zheng Pei, Yanlin Wang, Zihao Wang, Zikang Li, Enci Lin, Xin Peng, Zibin Zheng

TL;DR
This paper introduces APIKG4Syn, a knowledge graph-based data synthesis framework that improves low-resource framework code generation, demonstrated through a HarmonyOS case study with significant performance gains.
Contribution
It presents a novel API knowledge graph-driven data synthesis method for fine-tuning LLMs in low-resource software development, specifically targeting HarmonyOS.
Findings
Fine-tuning Qwen2.5-Coder-7B with APIKG4Syn achieves 25.00% pass@1.
Larger synthesized data volumes improve fine-tuning performance.
The optimal API data ratio is 8:2 for single-API to multi-API.
Abstract
In low-resource framework development (e.g., HarmonyOS), large language models (LLMs) often lack sufficient pre-training exposure, resulting in poor code generation performance. Although they generally preserve programming logic across languages, they frequently fail on framework-specific APIs and syntax, revealing a gap between learned algorithmic knowledge and unfamiliar framework conventions. Consequently, even advanced models such as GPT-4o struggle to produce correct code without prior exposure. Inspired by these challenges, we propose APIKG4Syn, a framework that leverages API knowledge graphs to synthesize API-oriented question-code pairs without requiring executable environments. It incorporates both single-API and multi-API information, with the latter guided by uncertainty estimation (UE) and Monte Carlo Tree Search (MCTS), to construct high-quality fine-tuning data. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
