Teaching a Language Model to Speak the Language of Tools
Simeon Emanuilov

TL;DR
This paper introduces a method to adapt multilingual language models for reliable tool use in any language, demonstrated with Bulgarian, improving function-calling accuracy and output quality while maintaining core language understanding.
Contribution
The study presents a novel bilingual dataset, continued training methodology, and TUCAN framework to enhance tool use in non-English language models, with publicly released resources.
Findings
Up to 28.75% improvement in function-calling accuracy
Models produce clean, parsable function calls
Core language understanding preserved
Abstract
External tool integration through function-calling is essential for practical language model applications, yet most multilingual models lack reliable tool-use capabilities in non-English languages. Even state-of-the-art multilingual models struggle with determining when to use tools and generating the structured outputs required for function calls, often exhibiting language confusion when prompted in lower-resource languages. This work presents a methodology for adapting existing language models to enable robust tool use in any target language, using Bulgarian as a case study. The approach involves continued training of the BgGPT model series (2.6B, 9B, 27B parameters) on a novel bilingual dataset of 10,035 function-calling examples designed to support standardized protocols like MCP (Model Context Protocol). The research introduces TUCAN (Tool-Using Capable Assistant Navigator), which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗llm-bg/Tucan-2.6B-v1.0model· 13 dl· ♡ 413 dl♡ 4
- 🤗llm-bg/Tucan-2.6B-v1.0-LoRAmodel
- 🤗llm-bg/Tucan-2.6B-v1.0-GGUFmodel· 148 dl· ♡ 1148 dl♡ 1
- 🤗llm-bg/Tucan-9B-v1.0-LoRAmodel
- 🤗llm-bg/Tucan-9B-v1.0model· 7 dl· ♡ 27 dl♡ 2
- 🤗llm-bg/Tucan-9B-v1.0-GGUFmodel· 133 dl· ♡ 1133 dl♡ 1
- 🤗llm-bg/Tucan-27B-v1.0model· 12 dl· ♡ 212 dl♡ 2
- 🤗llm-bg/Tucan-27B-v1.0-LoRAmodel
- 🤗llm-bg/Tucan-27B-v1.0-GGUFmodel· 112 dl· ♡ 1112 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Model-Driven Software Engineering Techniques
