Toward Automated and Trustworthy Scientific Analysis and Visualization with LLM-Generated Code
Apu Kumar Chakroborti, Yi Ding, Lipeng Wan

TL;DR
This paper evaluates the trustworthiness of LLMs in generating Python code for scientific analysis and visualization, proposing strategies to improve reliability and introducing a benchmark for future research.
Contribution
It systematically assesses LLM-generated scientific code, identifies reliability issues, and proposes three strategies to enhance code correctness and trustworthiness.
Findings
LLMs often produce unreliable code without human intervention.
Prompt disambiguation and retrieval augmentation improve code success rates.
The benchmark facilitates future evaluation of AI tools in scientific workflows.
Abstract
As modern science becomes increasingly data-intensive, the ability to analyze and visualize large-scale, complex datasets is critical to accelerating discovery. However, many domain scientists lack the programming expertise required to develop custom data analysis workflows, creating barriers to timely and effective insight. Large language models (LLMs) offer a promising solution by generating executable code from natural language descriptions. In this paper, we investigate the trustworthiness of open-source LLMs in autonomously producing Python scripts for scientific data analysis and visualization. We construct a benchmark suite of domain-inspired prompts that reflect real-world research tasks and systematically evaluate the executability and correctness of the generated code. Our findings show that, without human intervention, the reliability of LLM-generated code is limited, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning in Materials Science · Computational Physics and Python Applications
