Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language
Vinayshekhar Bannihatti Kumar, Disha Makhija, Manoj Ghuhan Arivazhagan, Rashmi Gangadharaiah

TL;DR
This paper investigates whether large language models can transfer coding skills to an unseen language, revealing a gap between understanding algorithms and expressing them in new languages.
Contribution
The study introduces PyLang, a new minimal language, and demonstrates that fine-tuning improves syntax but not semantic transfer, highlighting the implementation fidelity gap.
Findings
Fine-tuning teaches syntax quickly but not semantics.
Models perform 19% better on Python than PyLang.
Internal representations are similar across languages, but output diverges.
Abstract
Large language models (LLMs) achieve high pass rates on code generation benchmarks, yet whether they can transfer this ability to languages absent from pretraining remains poorly understood. We introduce PyLang, a minimal imperative language absent from all pretraining corpora, and evaluate frontier models zero-shot and fine-tuned Qwen3 (4B, 8B, 32B) on 352 problems. We find that fine-tuning quickly teaches syntax but fails to transfer semantic competence: Python outperforms PyLang by up to 19% across all configurations, and no intervention (multi-task learning, preference tuning, code infilling, or latent-space objectives) closes the gap. An LLM judge reveals that frontier models select an identical algorithm to Python 80% of the time, yet cannot translate it into a working PyLang implementation., and CKA analysis confirms that fine-tuned models converge to nearly identical internal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
