Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models

Ju-Young Kim; Ji-Hong Park; Se-Yeon Lee; Sujin Park; Gun-Woo Kim

arXiv:2512.08480·cs.CL·December 10, 2025

Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models

Ju-Young Kim, Ji-Hong Park, Se-Yeon Lee, Sujin Park, Gun-Woo Kim

PDF

Open Access

TL;DR

This paper introduces a soft inductive bias method with explicit reasoning perspectives to improve inappropriate utterance detection using large Korean language models, achieving higher accuracy and more rational judgments.

Contribution

It proposes a novel approach that guides large language models with explicit reasoning perspectives, enhancing their ability to detect inappropriate utterances more accurately.

Findings

01

Kanana-1.5 model achieves 87.0046% accuracy

02

Method improves accuracy by approximately 3.89% over standard supervised learning

03

Explicit reasoning perspectives lead to more precise and consistent judgments

Abstract

Recent incidents in certain online games and communities, where anonymity is guaranteed, show that unchecked inappropriate remarks frequently escalate into verbal abuse and even criminal behavior, raising significant social concerns. Consequently, there is a growing need for research on techniques that can detect inappropriate utterances within conversational texts to help build a safer communication environment. Although large-scale language models trained on Korean corpora and chain-of-thought reasoning have recently gained attention, research applying these approaches to inappropriate utterance detection remains limited. In this study, we propose a soft inductive bias approach that explicitly defines reasoning perspectives to guide the inference process, thereby promoting rational decision-making and preventing errors that may arise during reasoning. We fine-tune a Korean large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling