How Interpretable are Reasoning Explanations from Prompting Large   Language Models?

Wei Jie Yeo; Ranjan Satapathy; Rick Siow Mong Goh; Erik Cambria

arXiv:2402.11863·cs.CL·October 21, 2024·5 cites

How Interpretable are Reasoning Explanations from Prompting Large Language Models?

Wei Jie Yeo, Ranjan Satapathy, Rick Siow Mong Goh, Erik Cambria

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper evaluates the interpretability of reasoning explanations from large language models, considering multiple dimensions beyond faithfulness, and introduces a new alignment technique that significantly improves interpretability.

Contribution

It provides a comprehensive, multifaceted evaluation of reasoning explanations and proposes a novel Self-Entailment-Alignment Chain-of-Thought method that enhances interpretability.

Findings

01

Over 70% improvement in interpretability metrics

02

Evaluation across multiple reasoning benchmarks

03

Analysis of various prompting techniques

Abstract

Prompt Engineering has garnered significant attention for enhancing the performance of large language models across a multitude of tasks. Techniques such as the Chain-of-Thought not only bolster task performance but also delineate a clear trajectory of reasoning steps, offering a tangible form of explanation for the audience. Prior works on interpretability assess the reasoning chains yielded by Chain-of-Thought solely along a singular axis, namely faithfulness. We present a comprehensive and multifaceted evaluation of interpretability, examining not only faithfulness but also robustness and utility across multiple commonsense reasoning benchmarks. Likewise, our investigation is not confined to a single prompting technique; it expansively covers a multitude of prevalent prompting techniques employed in large language models, thereby ensuring a wide-ranging and exhaustive evaluation. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wj210/cot_interpretability
pytorchOfficial

Videos

How Interpretable are Reasoning Explanations from Prompting Large Language Models?· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques