Evaluating the Correctness of Inference Patterns Used by LLMs for Judgment

Lu Chen; Yuxuan Huang; Yixing Li; Dongrui Liu; Qihan Ren; Shuai Zhao; Kun Kuang; Zilong Zheng; Quanshi Zhang

arXiv:2410.09083·cs.AI·May 21, 2025

Evaluating the Correctness of Inference Patterns Used by LLMs for Judgment

Lu Chen, Yuxuan Huang, Yixing Li, Dongrui Liu, Qihan Ren, Shuai Zhao, Kun Kuang, Zilong Zheng, Quanshi Zhang

PDF

Open Access

TL;DR

This paper introduces a novel method to analyze the inference patterns of legal Large Language Models, revealing that seemingly correct outputs may be based on misleading or irrelevant reasoning, thus highlighting the importance of understanding LLM reasoning processes.

Contribution

It proposes a new interaction-based evaluation framework for analyzing the correctness of inference patterns in LLMs, especially in legal judgment tasks.

Findings

01

Many inference patterns are misleading or irrelevant despite correct outputs

02

The proposed metrics effectively quantify inference pattern correctness

03

Legal LLMs often rely on incorrect reasoning structures

Abstract

This paper presents a method to analyze the inference patterns used by Large Language Models (LLMs) for judgment in a case study on legal LLMs, so as to identify potential incorrect representations of the LLM, according to human domain knowledge. Unlike traditional evaluations on language generation results, we propose to evaluate the correctness of the detailed inference patterns of an LLM behind its seemingly correct outputs. To this end, we quantify the interactions between input phrases used by the LLM as primitive inference patterns, because recent theoretical achievements have proven several mathematical guarantees of the faithfulness of the interaction-based explanation. We design a set of metrics to evaluate the detailed inference patterns of LLMs. Experiments show that even when the language generation results appear correct, a significant portion of the inference patterns used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property · Legal Education and Practice Innovations · Artificial Intelligence in Law

MethodsSparse Evolutionary Training