LLM-Supported Natural Language to Bash Translation
Finnian Westenfelder, Erik Hemberg, Miguel Tulla, Stephen Moskal,, Una-May O'Reilly, Silviu Chiricescu

TL;DR
This paper introduces a large, manually verified dataset and a novel heuristic for evaluating the accuracy of large language models in translating natural language instructions into Bash commands, significantly improving assessment reliability.
Contribution
It provides the largest dataset to date for NL2SH translation and a new functional equivalence heuristic that enhances evaluation accuracy for LLMs in command translation tasks.
Findings
Dataset size increased by 441% and 135% for test and training data.
Heuristic achieves 95% confidence in command equivalence, 16% better than previous methods.
NL2SH accuracy can be improved by up to 32% using various translation techniques.
Abstract
The Bourne-Again Shell (Bash) command-line interface for Linux systems has complex syntax and requires extensive specialized knowledge. Using the natural language to Bash command (NL2SH) translation capabilities of large language models (LLMs) for command composition circumvents these issues. However, the NL2SH performance of LLMs is difficult to assess due to inaccurate test data and unreliable heuristics for determining the functional equivalence of Bash commands. We present a manually verified test dataset of 600 instruction-command pairs and a training dataset of 40,939 pairs, increasing the size of previous datasets by 441% and 135%, respectively. Further, we present a novel functional equivalence heuristic that combines command execution with LLM evaluation of command outputs. Our heuristic can determine the functional equivalence of two Bash commands with 95% confidence, a 16%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗westenfelder/Llama-3.2-1B-Instruct-NL2SHmodel· 1 dl1 dl
- 🤗westenfelder/Llama-3.1-8B-Instruct-NL2SHmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗westenfelder/Llama-3.2-3B-Instruct-NL2SHmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗westenfelder/Qwen2.5-Coder-0.5B-Instruct-NL2SHmodel· 4 dl4 dl
- 🤗westenfelder/Qwen2.5-Coder-7B-Instruct-NL2SHmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗westenfelder/Qwen2.5-Coder-3B-Instruct-NL2SHmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗westenfelder/Qwen2.5-Coder-1.5B-Instruct-NL2SHmodel· 2 dl· ♡ 12 dl♡ 1
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing
