Statistical investigations into the geometry and homology of random programs
Jon Sporring, Ken Friis Larsen

TL;DR
This paper explores the geometric and topological structure of random programs generated by language models like ChatGPT, using novel statistical and topological methods to analyze their syntax trees without relying on embedding techniques.
Contribution
It introduces a new approach employing geometric summary statistics and topological data analysis to characterize random program distributions, avoiding errors from traditional embedding methods.
Findings
Compared ChatGPT-4 and TinyLlama on image processing tasks.
Demonstrated the usefulness of topological methods in analyzing program structures.
Provided insights into model consistency and differences in program generation.
Abstract
AI-supported programming has taken giant leaps with tools such as Meta's Llama and openAI's chatGPT. These are examples of stochastic sources of programs and have already greatly influenced how we produce code and teach programming. If we consider input to such models as a stochastic source, a natural question is, what is the relation between the input and the output distributions, between the chatGPT prompt and the resulting program? In this paper, we will show how the relation between random Python programs generated from chatGPT can be described geometrically and topologically using Tree-edit distances between the program's syntax trees and without explicit modeling of the underlying space. A popular approach to studying high-dimensional samples in a metric space is to use low-dimensional embedding using, e.g., multidimensional scaling. Such methods imply errors depending on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Machine Learning and Algorithms
MethodsLLaMA
