Internalized Self-Correction for Large Language Models

Nishanth Upadhyaya; Raghavendra Sridharamurthy

arXiv:2412.16653·cs.AI·December 24, 2024

Internalized Self-Correction for Large Language Models

Nishanth Upadhyaya, Raghavendra Sridharamurthy

PDF

Open Access

TL;DR

This paper proposes 'Internalized Self-Correction' (InSeC), a novel training method enabling large language models to self-correct by learning from introduced mistakes and corrections, improving their accuracy and reliability.

Contribution

The paper introduces InSeC, a new training approach that combines negative sampling and self-reflection to enhance LLM self-correction capabilities during training.

Findings

01

InSeC improves LLM accuracy in self-correction tasks.

02

The method reduces hallucinations and incorrect outputs.

03

Enhanced instruction following demonstrated.

Abstract

In this article, we introduce 'Internalized Self-Correction' (InSeC) for large language models (LLMs). While many approaches exist for self-reflection at inference time, we propose a novel method that combines ideas from negative sampling, self-reflection during training, and inference time. InSeC allows LLMs to correct themselves by introducing mistakes and their corresponding corrections during training, thereby converting the learning process into a true supervised learning task with both positive and negative examples. This approach can be extended to improve instruction following and correct hallucinations or incorrect sentences generated by LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling