To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble

Haechan Mark Bong; Simon Roy; Euhid Aman; Giovanni Beltrame

arXiv:2605.21242·cs.RO·May 21, 2026

To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble

Haechan Mark Bong, Simon Roy, Euhid Aman, Giovanni Beltrame

PDF

TL;DR

This paper introduces a method for robot skill prediction using a small ensemble of fine-tuned sentence encoders trained on synthetic data, outperforming larger models in task-to-skill matching.

Contribution

It demonstrates that small, specialized models trained on synthetic data can effectively predict robot skills, outperforming larger general-purpose LLMs in fleet task routing.

Findings

01

Small ensemble model achieves 83.5% accuracy in task-to-skill matching.

02

Synthetic data training surpasses larger LLMs like GPT-OSS-120B and Llama-4-Scout-17B.

03

Synthetic dataset creation is enabled by LLM-assisted generation and label auditing.

Abstract

As robot fleets become more heterogeneous, including humanoids, rovers, quadrupeds, and drones, selecting the right robot for a task becomes a core systems problem. We study robot skill prediction: mapping a natural-language task description to the physical capabilities required to execute it, such as fly, wheels, legs, surface water, under water and hands. Since labelled data that maps natural-language task descriptions to robot's physical capabilities does not exist, we construct a synthetic task-to-skill dataset using LLM-assisted generation and targeted label auditing. Trained on this data, a ~133M-parameter ensemble of two fine-tuned sentence encoders (mpnet + MiniLM) reaches 83.5% task-to-skill matching on a stratified 200 task dataset, outperforming Kimi K2 (1T MoE) at 72.0%, GPT-OSS-120B at 71.5%, and Llama-4-Scout-17B at 69.0% under the same zero-shot prompt. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.