Internalized Self-Correction for Large Language Models
Nishanth Upadhyaya, Raghavendra Sridharamurthy

TL;DR
This paper proposes 'Internalized Self-Correction' (InSeC), a novel training method enabling large language models to self-correct by learning from introduced mistakes and corrections, improving their accuracy and reliability.
Contribution
The paper introduces InSeC, a new training approach that combines negative sampling and self-reflection to enhance LLM self-correction capabilities during training.
Findings
InSeC improves LLM accuracy in self-correction tasks.
The method reduces hallucinations and incorrect outputs.
Enhanced instruction following demonstrated.
Abstract
In this article, we introduce 'Internalized Self-Correction' (InSeC) for large language models (LLMs). While many approaches exist for self-reflection at inference time, we propose a novel method that combines ideas from negative sampling, self-reflection during training, and inference time. InSeC allows LLMs to correct themselves by introducing mistakes and their corresponding corrections during training, thereby converting the learning process into a true supervised learning task with both positive and negative examples. This approach can be extended to improve instruction following and correct hallucinations or incorrect sentences generated by LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
