Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization
Yuntian Liu, Tao Zhu, Xiaoyang Liu, Yu Chen, Zhaoxuan Liu, Qingfeng Guo, Jiashuo Zhang, Kangjie Bao, Tao Luo

TL;DR
GTED introduces a new evaluation metric for statement autoformalization that is computationally efficient and semantically faithful, outperforming existing metrics on key benchmarks.
Contribution
The paper presents GTED, a novel semantic-aware evaluation framework for autoformalization that standardizes statements into operator trees and measures their similarity, improving over prior metrics.
Findings
GTED achieves top accuracy and Kappa on miniF2F.
GTED attains joint-highest accuracy on ProofNet.
GTED is computationally lightweight and more faithful.
Abstract
Statement autoformalization, the automated translation of statements from natural language into formal languages, has become a subject of extensive research, yet the development of robust automated evaluation metrics remains limited. Existing evaluation methods often lack semantic understanding, face challenges with high computational costs, and are constrained by the current progress of automated theorem proving. To address these issues, we propose GTED (Generalized Tree Edit Distance), a novel evaluation framework that first standardizes formal statements and converts them into operator trees, then determines the semantic similarity using the eponymous GTED metric. Across the miniF2F and ProofNet benchmarks, GTED consistently ranks as a top-performing metric, achieving the highest accuracy and Kappa on miniF2F and the joint-highest accuracy on ProofNet. This strong overall performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
