evalSmarT: An LLM-Based Framework for Evaluating Smart Contract Generated Comments

Fatou Ndiaye Mbodji

arXiv:2507.20774·cs.AI·July 29, 2025

evalSmarT: An LLM-Based Framework for Evaluating Smart Contract Generated Comments

Fatou Ndiaye Mbodji

PDF

TL;DR

evalSmarT is a flexible framework utilizing large language models to evaluate smart contract comments, addressing the limitations of traditional metrics and enabling scalable, semantically aware assessment.

Contribution

The paper introduces evalSmarT, a modular framework that leverages diverse LLM configurations for evaluating smart contract comments, improving over traditional metrics.

Findings

01

Prompt design greatly influences evaluation quality.

02

LLM-based evaluation aligns well with human judgment.

03

Framework supports over 400 evaluator configurations.

Abstract

Smart contract comment generation has gained traction as a means to improve code comprehension and maintainability in blockchain systems. However, evaluating the quality of generated comments remains a challenge. Traditional metrics such as BLEU and ROUGE fail to capture domain-specific nuances, while human evaluation is costly and unscalable. In this paper, we present \texttt{evalSmarT}, a modular and extensible framework that leverages large language models (LLMs) as evaluators. The system supports over 400 evaluator configurations by combining approximately 40 LLMs with 10 prompting strategies. We demonstrate its application in benchmarking comment generation tools and selecting the most informative outputs. Our results show that prompt design significantly impacts alignment with human judgment, and that LLM-based evaluation offers a scalable and semantically rich alternative to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.