Beyond Labels: Aligning Large Language Models with Human-like Reasoning

Muhammad Rafsan Kabir; Rafeed Mohammad Sultan; Ihsanul Haque Asif,; Jawad Ibn Ahad; Fuad Rahman; Mohammad Ruhul Amin; Nabeel Mohammed; Shafin; Rahman

arXiv:2408.11879·cs.CL·August 23, 2024

Beyond Labels: Aligning Large Language Models with Human-like Reasoning

Muhammad Rafsan Kabir, Rafeed Mohammad Sultan, Ihsanul Haque Asif,, Jawad Ibn Ahad, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin, Rahman

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel fine-tuning method for large language models that incorporates ethical reasons alongside labels, significantly improving their alignment with human morals and reasoning in decision-making tasks.

Contribution

The study presents a new fine-tuning approach using both ethics labels and reasons, and introduces the DFAR dataset to enhance LLMs' human-like ethical reasoning capabilities.

Findings

01

Proposed fine-tuning method outperforms existing approaches in accuracy.

02

Models with reasons show lower misalignment in ethical reasoning.

03

Enhanced alignment with human ethics demonstrated through evaluation.

Abstract

Aligning large language models (LLMs) with a human reasoning approach ensures that LLMs produce morally correct and human-like decisions. Ethical concerns are raised because current models are prone to generating false positives and providing malicious responses. To contribute to this issue, we have curated an ethics dataset named Dataset for Aligning Reasons (DFAR), designed to aid in aligning language models to generate human-like reasons. The dataset comprises statements with ethical-unethical labels and their corresponding reasons. In this study, we employed a unique and novel fine-tuning approach that utilizes ethics labels and their corresponding reasons (L+R), in contrast to the existing fine-tuning approach that only uses labels (L). The original pre-trained versions, the existing fine-tuned versions, and our proposed fine-tuned versions of LLMs were then evaluated on an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apurba-nsu-rnd-lab/dfar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsALIGN