PBE Meets LLM: When Few Examples Aren't Few-Shot Enough
Shuning Zhang, Yongjoo Park

TL;DR
This paper evaluates large language models on programming by example tasks involving tabular data transformations, comparing their performance with traditional methods and proposing a hybrid approach for improved accuracy.
Contribution
It introduces a comprehensive evaluation of LLMs on PBE tasks, compares prompting strategies, and proposes a hybrid method combining traditional solvers with LLMs.
Findings
LLMs support diverse input formats and outperform conventional methods.
Performance drops on ambiguous tasks, highlighting limitations.
Hybrid approach improves overall success rate.
Abstract
Large language models (LLMs) can generate code from natural language descriptions. Their performance is typically evaluated using programming benchmarks that simulate real-world tasks. These benchmarks provide specifications in the form of docstrings, function signatures, or bug reports. The model then generates a program, which is tested against predefined test cases. In contrast, Programming by Example (PBE) uses input-output examples as the specification. Traditional PBE systems rely on search-based methods over restricted transformation spaces. They are usually designed for narrow domains and fixed input formats. It remains unclear how well LLMs perform on PBE tasks. In this work, we evaluate LLMs on PBE tasks involving tabular data transformations. We prompt models to generate functions that convert an input table to an output table. We test the generated functions on unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
