Benchmarking graph construction by large language models for coherence-driven inference
Steve Huntsman, Jewell Thomas

TL;DR
This paper introduces a benchmarking method for large language models to reconstruct coherence graphs from natural language propositions, demonstrating promising results in coherence-driven inference tasks.
Contribution
It presents an algorithm for generating propositions that instantiate coherence graphs and benchmarks LLMs' ability to reconstruct these graphs from language.
Findings
LLMs can reconstruct coherence graphs with high accuracy on sparse graphs
Single prompts to reasoning-optimized LLMs yield promising results
Reconstruction success reaches 50% on certain graph types
Abstract
We devise an algorithm to generate propositions that objectively instantiate graphs supporting coherence-driven inference. We also benchmark the ability of large language models (LLMs) to reconstruct coherence graphs from (a simple transformation of) propositions expressed in natural language, with promising results from a single prompt to reasoning-optimized LLMs. For example, o1/3/4-mini achieve perfect reconstruction half of the time on sparse graphs. Coherence-driven inference on consistency evaluations by LLMs may advance machine cognition capabilities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
