How to Make the Most of LLMs' Grammatical Knowledge for Acceptability   Judgments

Yusuke Ide; Yuto Nishida; Justin Vasselli; Miyu Oba; Yusuke Sakai,; Hidetaka Kamigaito; Taro Watanabe

arXiv:2408.09639·cs.CL·February 10, 2025

How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments

Yusuke Ide, Yuto Nishida, Justin Vasselli, Miyu Oba, Yusuke Sakai,, Hidetaka Kamigaito, Taro Watanabe

PDF

Open Access 1 Video

TL;DR

This paper explores improved methods for assessing LLMs' grammatical knowledge using prompts and templates, demonstrating that ensemble approaches outperform traditional probability-based judgments across multiple languages.

Contribution

It introduces and compares new prompt-based and probability readout methods for evaluating LLMs' grammatical acceptability, showing their effectiveness over conventional techniques.

Findings

01

Prompt-based methods outperform traditional probability comparisons.

02

Ensembling different methods yields higher accuracy.

03

Methods reveal different linguistic phenomena in LLMs.

Abstract

The grammatical knowledge of language models (LMs) is often measured using a benchmark of linguistic minimal pairs, where the LMs are presented with a pair of acceptable and unacceptable sentences and required to judge which is more acceptable. Conventional approaches directly compare sentence probabilities assigned by LMs, but recent large language models (LLMs) are trained to perform tasks via prompting, and thus, the raw probabilities they assign may not fully reflect their grammatical knowledge. In this study, we attempt to derive more accurate acceptability judgments from LLMs using prompts and templates. Through extensive experiments in English and Chinese, we compare nine judgment methods and find two of them, a probability readout method -- in-template LP and a prompt-based method -- Yes/No probability computing, achieve higher accuracy than the conventional ones. Our analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments· underline

Taxonomy

TopicsTaxation and Legal Issues · Comparative and International Law Studies