OpenEthics: A Comprehensive Ethical Evaluation of Open-Source Generative Large Language Models

Y{\i}ld{\i}r{\i}m \"Ozen; Burak Erin\c{c} \c{C}etin; Kaan Eng\"ur; Elif Naz Demiry{\i}lmaz; Cagri Toraman

arXiv:2505.16036·cs.CL·January 9, 2026

OpenEthics: A Comprehensive Ethical Evaluation of Open-Source Generative Large Language Models

Y{\i}ld{\i}r{\i}m \"Ozen, Burak Erin\c{c} \c{C}etin, Kaan Eng\"ur, Elif Naz Demiry{\i}lmaz, Cagri Toraman

PDF

Open Access 1 Repo

TL;DR

This paper conducts a broad ethical evaluation of 29 open-source large language models across multiple dimensions and languages, revealing strengths in safety and fairness but highlighting reliability issues, with implications for safer model development.

Contribution

It introduces a comprehensive, multi-dimensional ethical assessment framework applied to diverse open-source LLMs in multiple languages, filling gaps in prior narrow or limited evaluations.

Findings

01

Many models perform well in safety, fairness, and robustness.

02

Reliability remains a significant challenge across models.

03

Larger models tend to have better ethical performance.

Abstract

Generative large language models present significant potential but also raise critical ethical concerns, including issues of safety, fairness, robustness, and reliability. Most existing ethical studies, however, are limited by their narrow focus, a lack of language diversity, and an evaluation of a restricted set of models. To address these gaps, we present a broad ethical evaluation of 29 recent open-source LLMs using a novel dataset that assesses four key ethical dimensions: robustness, reliability, safety, and fairness. Our analysis includes both a high-resource language, English, and a low-resource language, Turkish, providing a comprehensive assessment and a guide for safer model development. Using an LLM-as-a-Judge methodology, our experimental results indicate that many open-source models demonstrate strong performance in safety, fairness, and robustness, while reliability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

metunlp/openethics
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education · Topic Modeling

MethodsFocus