Enabling Classifiers to Make Judgements Explicitly Aligned with Human   Values

Yejin Bang; Tiezheng Yu; Andrea Madotto; Zhaojiang Lin; Mona Diab,; Pascale Fung

arXiv:2210.07652·cs.CL·October 17, 2022

Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab,, Pascale Fung

PDF

Open Access

TL;DR

This paper introduces a framework for creating classifiers that explicitly incorporate human values, using large language models to generate training data, resulting in improved performance and greater inclusivity and explainability.

Contribution

The paper presents a novel value-aligned classification framework that distills human values from large language models to enhance classifier alignment with human values.

Findings

01

VA-Models outperform baselines by at least 15.56% F1-score

02

Generated data from LLMs improves classifier performance

03

Explicit human value input enhances AI inclusivity and explainability

Abstract

Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-aligned knowledge from large-scale language models (LLMs) to construct value-aligned classifiers in two steps. First, we generate value-aligned training data from LLMs by prompt-based few-shot learning. Next, we fine-tune smaller classification models with the generated data for the task. Empirical results show that our VA-Models surpass multiple baselines by at least 15.56% on the F1-score, including few-shot learning with OPT-175B and existing text augmentation methods. We suggest that using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection