Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension   Ability

Yujin Han; Lei Xu; Sirui Chen; Difan Zou; Chaochao Lu

arXiv:2411.19456·cs.CL·December 2, 2024

Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability

Yujin Han, Lei Xu, Sirui Chen, Difan Zou, Chaochao Lu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a causal framework to evaluate whether large language models truly understand deep semantics or rely on surface cues, revealing that most models do possess some deep comprehension which varies with model type and size.

Contribution

The paper develops a novel causal mediation analysis approach with approximated direct and indirect effects to assess LLMs' deep and surface structure understanding, providing a more comprehensive evaluation method.

Findings

01

Most mainstream LLMs show deep structure comprehension that improves with accuracy.

02

Closed-source LLMs rely more on deep structure, while open-source models are more surface-sensitive.

03

Deep structure comprehension increases with model scale.

Abstract

Large language models (LLMs) have shown remarkable capability in natural language tasks, yet debate persists on whether they truly comprehend deep structure (i.e., core semantics) or merely rely on surface structure (e.g., presentation format). Prior studies observe that LLMs' performance declines when intervening on surface structure, arguing their success relies on surface structure recognition. However, surface structure sensitivity does not prevent deep structure comprehension. Rigorously evaluating LLMs' capability requires analyzing both, yet deep structure is often overlooked. To this end, we assess LLMs' comprehension ability using causal mediation analysis, aiming to fully discover the capability of using both deep and surface structures. Specifically, we formulate the comprehension of deep structure as direct causal effect (DCE) and that of surface structure as indirect causal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OpenCausaLab/ADCE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLegal Education and Practice Innovations · Artificial Intelligence in Law