General Purpose Verification for Chain of Thought Prompting

Robert Vacareanu; Anurag Pratik; Evangelia Spiliopoulou; Zheng Qi,; Giovanni Paolini; Neha Anna John; Jie Ma; Yassine Benajiba; Miguel; Ballesteros

arXiv:2405.00204·cs.CL·May 2, 2024·3 cites

General Purpose Verification for Chain of Thought Prompting

Robert Vacareanu, Anurag Pratik, Evangelia Spiliopoulou, Zheng Qi,, Giovanni Paolini, Neha Anna John, Jie Ma, Yassine Benajiba, Miguel, Ballesteros

PDF

Open Access

TL;DR

This paper enhances the reasoning capabilities of Large Language Models by applying verification principles to their generated reasoning steps, improving accuracy across multiple reasoning tasks.

Contribution

It introduces a verification-based approach using relevance, accuracy, and consistency constraints, along with perplexity scoring, to improve LLM reasoning performance.

Findings

01

Outperforms vanilla generation on all tested datasets.

02

Surpasses best-of N sampling in 6 out of 9 datasets.

03

Improves reasoning accuracy through step verification and perplexity scoring.

Abstract

Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should adhere to while reasoning: (i) Relevance, (ii) Mathematical Accuracy, and (iii) Logical Consistency. We apply these constraints to the reasoning steps generated by the LLM to improve the accuracy of the final generation. The constraints are applied in the form of verifiers: the model itself is asked to verify if the generated steps satisfy each constraint. To further steer the generations towards high-quality solutions, we use the perplexity of the reasoning steps as an additional verifier. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Computing and Networks · Mental Health Research Topics