ArkTS-CodeSearch: A Open-Source ArkTS Dataset for Code Retrieval
Yulong He, Artem Ermakov, Sergey Kovalchuk, Artem Aliev, Dmitry Shalymov

TL;DR
This paper introduces the first large-scale ArkTS dataset and benchmark for code retrieval, enabling improved code understanding and retrieval tasks within the OpenHarmony ecosystem.
Contribution
It provides a comprehensive ArkTS dataset from open-source repositories and evaluates models for ArkTS code retrieval, filling a critical gap in ArkTS code intelligence research.
Findings
High-performing model for ArkTS code understanding achieved
Effective cross-platform deduplication of ArkTS functions
Established the first systematic benchmark for ArkTS code retrieval
Abstract
ArkTS is a core programming language in the OpenHarmony ecosystem, yet research on ArkTS code intelligence is hindered by the lack of public datasets and evaluation benchmarks. This paper presents a large-scale ArkTS dataset constructed from open-source repositories, targeting code retrieval and code evaluation tasks. We design a single-search task, where natural language comments are used to retrieve corresponding ArkTS functions. ArkTS repositories are crawled from GitHub and Gitee, and comment-function pairs are extracted using tree-sitter-arkts, followed by cross-platform deduplication and statistical analysis of ArkTS function types. We further evaluate existing open-source code embedding models on the single-search task and perform fine-tuning using both ArkTS and TypeScript training datasets, resulting in a high-performing model for ArkTS code understanding. This work establishes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software Testing and Debugging Techniques
