Is Programming by Example solved by LLMs?
Wen-Ding Li, Kevin Ellis

TL;DR
This paper evaluates the capability of Large Language Models in solving Programming-by-Examples tasks, revealing they excel with fine-tuning on in-distribution data but struggle with out-of-distribution problems, highlighting both progress and limitations.
Contribution
The study provides empirical analysis of LLMs on PBE tasks across domains, demonstrating the importance of fine-tuning and offering insights into generalization challenges.
Findings
Pretrained LLMs are ineffective at PBE tasks without fine-tuning.
Fine-tuning significantly improves performance on in-distribution problems.
Out-of-distribution generalization remains a key challenge for LLMs in PBE.
Abstract
Programming-by-Examples (PBE) aims to generate an algorithm from input-output examples. Such systems are practically and theoretically important: from an end-user perspective, they are deployed to millions of people, and from an AI perspective, PBE corresponds to a very general form of few-shot inductive inference. Given the success of Large Language Models (LLMs) in code-generation tasks, we investigate here the extent to which LLMs can be said to have "solved" PBE. We experiment on classic domains such as lists and strings, and an uncommon graphics programming domain not well represented in typical pretraining data. We find that pretrained models are not effective at PBE, but that they can be fine-tuned for much higher performance, provided the test problems are in-distribution. We analyze empirically what causes these models to succeed and fail, and take steps toward understanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Data Storage Technologies
