Benchmarking graph construction by large language models for coherence-driven inference

Steve Huntsman; Jewell Thomas

arXiv:2502.13953·cs.AI·August 21, 2025

Benchmarking graph construction by large language models for coherence-driven inference

Steve Huntsman, Jewell Thomas

PDF

Open Access

TL;DR

This paper introduces a benchmarking method for large language models to reconstruct coherence graphs from natural language propositions, demonstrating promising results in coherence-driven inference tasks.

Contribution

It presents an algorithm for generating propositions that instantiate coherence graphs and benchmarks LLMs' ability to reconstruct these graphs from language.

Findings

01

LLMs can reconstruct coherence graphs with high accuracy on sparse graphs

02

Single prompts to reasoning-optimized LLMs yield promising results

03

Reconstruction success reaches 50% on certain graph types

Abstract

We devise an algorithm to generate propositions that objectively instantiate graphs supporting coherence-driven inference. We also benchmark the ability of large language models (LLMs) to reconstruct coherence graphs from (a simple transformation of) propositions expressed in natural language, with promising results from a single prompt to reasoning-optimized LLMs. For example, o1/3/4-mini achieve perfect reconstruction half of the time on sparse graphs. Coherence-driven inference on consistency evaluations by LLMs may advance machine cognition capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications