Solving Probability and Statistics Problems by Program Synthesis
Leonard Tang, Elizabeth Ke, Nikhil Singh, Nakul Verma and, Iddo Drori

TL;DR
This paper demonstrates how large language models like Codex can be used to solve university-level probability and statistics problems by transforming questions into executable code, showcasing a novel application of program synthesis in education.
Contribution
It introduces a new dataset of probability and statistics problems and presents a method to solve them using program synthesis with Codex, including prompt engineering and similarity measurement.
Findings
Successfully solved university probability questions using code generation
Developed a method to transform questions into executable programs
Established a new dataset for probability and statistics problem solving
Abstract
We solve university level probability and statistics questions by program synthesis using OpenAI's Codex, a Transformer trained on text and fine-tuned on code. We transform course problems from MIT's 18.05 Introduction to Probability and Statistics and Harvard's STAT110 Probability into programming tasks. We then execute the generated code to get a solution. Since these course questions are grounded in probability, we often aim to have Codex generate probabilistic programs that simulate a large number of probabilistic dependencies to compute its solution. Our approach requires prompt engineering to transform the question from its original form to an explicit, tractable form that results in a correct program and solution. To estimate the amount of work needed to translate an original question into its tractable form, we measure the similarity between original and transformed questions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Adam · Dense Connections · Layer Normalization · Absolute Position Encodings · Multi-Head Attention · Label Smoothing · Byte Pair Encoding
