Beyond Labels: Aligning Large Language Models with Human-like Reasoning
Muhammad Rafsan Kabir, Rafeed Mohammad Sultan, Ihsanul Haque Asif,, Jawad Ibn Ahad, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin, Rahman

TL;DR
This paper introduces a novel fine-tuning method for large language models that incorporates ethical reasons alongside labels, significantly improving their alignment with human morals and reasoning in decision-making tasks.
Contribution
The study presents a new fine-tuning approach using both ethics labels and reasons, and introduces the DFAR dataset to enhance LLMs' human-like ethical reasoning capabilities.
Findings
Proposed fine-tuning method outperforms existing approaches in accuracy.
Models with reasons show lower misalignment in ethical reasoning.
Enhanced alignment with human ethics demonstrated through evaluation.
Abstract
Aligning large language models (LLMs) with a human reasoning approach ensures that LLMs produce morally correct and human-like decisions. Ethical concerns are raised because current models are prone to generating false positives and providing malicious responses. To contribute to this issue, we have curated an ethics dataset named Dataset for Aligning Reasons (DFAR), designed to aid in aligning language models to generate human-like reasons. The dataset comprises statements with ethical-unethical labels and their corresponding reasons. In this study, we employed a unique and novel fine-tuning approach that utilizes ethics labels and their corresponding reasons (L+R), in contrast to the existing fine-tuning approach that only uses labels (L). The original pre-trained versions, the existing fine-tuned versions, and our proposed fine-tuned versions of LLMs were then evaluated on an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsALIGN
