Assessing the Interpretability of Programmatic Policies with Large Language Models
Zahra Bashir, Michael Bowling, Levi H. S. Lelis

TL;DR
This paper introduces a novel LLM-based metric to evaluate the interpretability of programmatic policies by comparing original and reconstructed programs through natural language explanations, validated on game policies.
Contribution
The paper presents a new automated method using large language models to quantify the interpretability of programmatic policies, addressing a gap in systematic evaluation.
Findings
The metric reliably ranks interpretability levels of different policies.
It correlates well with human judgments of interpretability.
The approach is cost-effective and scalable for policy evaluation.
Abstract
Although the synthesis of programs encoding policies often carries the promise of interpretability, systematic evaluations were never performed to assess the interpretability of these policies, likely because of the complexity of such an evaluation. In this paper, we introduce a novel metric that uses large-language models (LLM) to assess the interpretability of programmatic policies. For our metric, an LLM is given both a program and a description of its associated programming language. The LLM then formulates a natural language explanation of the program. This explanation is subsequently fed into a second LLM, which tries to reconstruct the program from the natural-language explanation. Our metric then measures the behavioral similarity between the reconstructed program and the original. We validate our approach with synthesized and human-crafted programmatic policies for playing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices · Explainable Artificial Intelligence (XAI)
