Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making
Thomas T. Hills

TL;DR
This paper explores using metacognitive prompts like "could you be wrong?" to help large language models identify biases and errors, improving their decision-making and aligning responses with human reasoning.
Contribution
It introduces a novel prompt-based debiasing strategy inspired by human metacognition, demonstrating its effectiveness in revealing biases and errors in LLM responses.
Findings
Metacognitive prompts reveal biases and errors in LLMs.
Prompting improves LLM self-awareness and response quality.
Aligns LLM responses more closely with human reasoning.
Abstract
Identifying bias in LLMs is ongoing. Because they are still in development, what is true today may be false tomorrow. We therefore need general strategies for debiasing that will outlive current models. Strategies developed for debiasing human decision making offer one promising approach as they incorporate an LLM-style prompt intervention designed to bring latent knowledge into awareness during decision making. LLMs trained on vast amounts of information contain information about potential biases, counter-arguments, and contradictory evidence, but that information may only be brought to bear if prompted. Metacognitive prompts developed in the human decision making literature are designed to achieve this, and as I demonstrate here, they show promise with LLMs. The prompt I focus on here is "could you be wrong?" Following an LLM response, this prompt leads LLMs to produce additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
