Active Prompting with Chain-of-Thought for Large Language Models

Shizhe Diao; Pengcheng Wang; Yong Lin; Rui Pan; Xiang Liu; Tong Zhang

arXiv:2302.12246·cs.CL·July 23, 2024·44 cites

Active Prompting with Chain-of-Thought for Large Language Models

Shizhe Diao, Pengcheng Wang, Yong Lin, Rui Pan, Xiang Liu, Tong Zhang

PDF

Open Access 2 Repos 1 Video 3 Reviews

TL;DR

This paper introduces Active-Prompt, a method that uses active learning principles to select the most informative questions for annotation, enhancing chain-of-thought prompting in large language models and achieving state-of-the-art results on reasoning tasks.

Contribution

It proposes an active learning approach to optimize prompt examples for LLMs, improving reasoning performance without relying solely on human-annotated exemplars.

Findings

01

Achieves state-of-the-art on eight reasoning tasks

02

Effective question selection improves model performance

03

Uncertainty metrics guide optimal example annotation

Abstract

The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- Combining active learning with prompt construction is interesting and novel to me - With the extensive experiments and analysis, the execution is definitely above average - Writing is clear

Weaknesses

- [Major] An important and very relevant baseline is missing: https://arxiv.org/abs/2210.00720. Their method is very similar to Active Prompt and simply selects the longest training instances. I would be curious to see how it compares to this work. - [Major] One can imagine that if the model is reasonably good, the demonstrations selected by Active-Prompt will be more useful. I wonder whether this is still the case for “weaker” models. If the model does not know too much about the task, will the

Reviewer 02Rating 1· strong rejectConfidence 5

Strengths

The idea is straightforward and the motivation is clear. The method makes sense.

Weaknesses

1. **Baselines are too weak, leading to a misunderstanding of the effectiveness of the proposed method.** I would like to urge the authors to include more powerful baselines in the experiment rather than hide them. ALL the reviewers are experts in this domain and familiar with the state-of-the-art performance of LLMs on these benchmarks in this domain. In the experiment section, the authors only include the CoT annotations from [1] as the most important baseline. It is widely acknowledged and st

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- Overall the paper is written clearly and proposes an approach for example selection for chain-of-thought prompting. The method uses existing approaches from active learning and shows improvements over baselines. - The authors evaluate their approach on a range of mathematical and commonsense reasoning tasks, and conduct ablations to understand the effect of different factors.

Weaknesses

- The approach seems to have limited applicability as it requires the existence of either large enough datasets for a particular task or similar task to sample from. The authors also report variations between different annotators, further attesting to the difficulty of the task. - Some details in the paper are missing. For example, how is the variance based approach applied to textual answers? There are no results presented with the self-confidence approach and only an example is given, etc.

Code & Models

Repositories

Videos

Active Prompting with Chain-of-Thought for Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms