DSGram: Dynamic Weighting Sub-Metrics for Grammatical Error Correction in the Era of Large Language Models

Jinxiang Xie; Yilin Li; Xunjian Yin; Xiaojun Wan

arXiv:2412.12832·cs.CL·June 24, 2025

DSGram: Dynamic Weighting Sub-Metrics for Grammatical Error Correction in the Era of Large Language Models

Jinxiang Xie, Yilin Li, Xunjian Yin, Xiaojun Wan

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces DSGram, an innovative evaluation framework for grammatical error correction that combines multiple metrics with dynamic weighting, leveraging large language models and human annotations to improve assessment accuracy.

Contribution

We propose DSGram, a novel evaluation method that integrates multiple criteria with a dynamic weighting mechanism using AHP and LLMs, addressing limitations of traditional reference-based metrics.

Findings

01

Enhanced evaluation accuracy for GEC models

02

Effective integration of semantic, edit, and fluency metrics

03

Validated with human annotations and LLM-simulated data

Abstract

Evaluating the performance of Grammatical Error Correction (GEC) models has become increasingly challenging, as large language model (LLM)-based GEC systems often produce corrections that diverge from provided gold references. This discrepancy undermines the reliability of traditional reference-based evaluation metrics. In this study, we propose a novel evaluation framework for GEC models, DSGram, integrating Semantic Coherence, Edit Level, and Fluency, and utilizing a dynamic weighting mechanism. Our framework employs the Analytic Hierarchy Process (AHP) in conjunction with large language models to ascertain the relative importance of various evaluation criteria. Additionally, we develop a dataset incorporating human annotations and LLM-simulated sentences to validate our algorithms and fine-tune more cost-effective models. Experimental results indicate that our proposed approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jxtse/GEC-Metrics-DSGram
noneOfficial

Datasets

jxtse/DSGram
dataset· 7 dl
7 dl

Videos

DSGram: Dynamic Weighting Sub-Metrics for Grammatical Error Correction in the Era of Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling