AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI

Chae-Gyun Lim; Seung-Ho Han; EunYoung Byun; Jeongyun Han; Soohyun Cho; Eojin Joo; Heehyeon Kim; Sieun Kim; Juhoon Lee; Hyunsoo Lee; Dongkun Lee; Jonghwan Hyeon; Yechan Hwang; Young-Jun Lee; Kyeongryul Lee; Minhyeong An; Hyunjun Ahn; Jeongwoo Son; Junho Park; Donggyu Yoon; Taehyung Kim; Jeemin Kim; Dasom Choi; Kwangyoung Lee; Hyunseung Lim; Yeohyun Jung; Jongok Hong; Sooyohn Nam; Joonyoung Park; Sungmin Na; Yubin Choi; Jeanne Choi; Yoojin Hong; Sueun Jang; Youngseok Seo; Somin Park; Seoungung Jo; Wonhye Chae; Yeeun Jo; Eunyoung Kim; Joyce Jiyoung Whang; HwaJung Hong; Joseph Seering; Uichin Lee; Juho Kim; Sunna Choi; Seokyeon Ko; Taeho Kim; Kyunghoon Kim; Myungsik Ha; So Jung Lee; Jemin Hwang; JoonHo Kwak; Ho-Jin Choi

arXiv:2511.20686·cs.AI·November 27, 2025

AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI

Chae-Gyun Lim, Seung-Ho Han, EunYoung Byun, Jeongyun Han, Soohyun Cho, Eojin Joo, Heehyeon Kim, Sieun Kim, Juhoon Lee, Hyunsoo Lee, Dongkun Lee, Jonghwan Hyeon, Yechan Hwang, Young-Jun Lee, Kyeongryul Lee, Minhyeong An, Hyunjun Ahn, Jeongwoo Son, Junho Park, Donggyu Yoon

PDF

Open Access 1 Datasets

TL;DR

AssurAI is a comprehensive, quality-controlled Korean multimodal dataset designed to evaluate the safety of generative AI, addressing language and cultural gaps in existing safety datasets.

Contribution

We created AssurAI, a large-scale Korean multimodal safety dataset with a new risk taxonomy, rigorous quality control, and validation for assessing generative AI safety in Korean socio-cultural contexts.

Findings

01

AssurAI effectively assesses safety of recent LLMs.

02

The dataset covers 35 risk factors across multiple modalities.

03

Rigorous quality control ensures high data integrity.

Abstract

The rapid evolution of generative AI necessitates robust safety evaluations. However, current safety datasets are predominantly English-centric, failing to capture specific risks in non-English, socio-cultural contexts such as Korean, and are often limited to the text modality. To address this gap, we introduce AssurAI, a new quality-controlled Korean multimodal dataset for evaluating the safety of generative AI. First, we define a taxonomy of 35 distinct AI risk factors, adapted from established frameworks by a multidisciplinary expert group to cover both universal harms and relevance to the Korean socio-cultural context. Second, leveraging this taxonomy, we construct and release AssurAI, a large-scale Korean multimodal dataset comprising 11,480 instances across text, image, video, and audio. Third, we apply the rigorous quality control process used to ensure data integrity, featuring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

TTA01/AssurAI
dataset· 440 dl
440 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)