Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models
Ju-Young Kim, Ji-Hong Park, Se-Yeon Lee, Sujin Park, Gun-Woo Kim

TL;DR
This paper introduces a soft inductive bias method with explicit reasoning perspectives to improve inappropriate utterance detection using large Korean language models, achieving higher accuracy and more rational judgments.
Contribution
It proposes a novel approach that guides large language models with explicit reasoning perspectives, enhancing their ability to detect inappropriate utterances more accurately.
Findings
Kanana-1.5 model achieves 87.0046% accuracy
Method improves accuracy by approximately 3.89% over standard supervised learning
Explicit reasoning perspectives lead to more precise and consistent judgments
Abstract
Recent incidents in certain online games and communities, where anonymity is guaranteed, show that unchecked inappropriate remarks frequently escalate into verbal abuse and even criminal behavior, raising significant social concerns. Consequently, there is a growing need for research on techniques that can detect inappropriate utterances within conversational texts to help build a safer communication environment. Although large-scale language models trained on Korean corpora and chain-of-thought reasoning have recently gained attention, research applying these approaches to inappropriate utterance detection remains limited. In this study, we propose a soft inductive bias approach that explicitly defines reasoning perspectives to guide the inference process, thereby promoting rational decision-making and preventing errors that may arise during reasoning. We fine-tune a Korean large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling
