OpenEthics: A Comprehensive Ethical Evaluation of Open-Source Generative Large Language Models
Y{\i}ld{\i}r{\i}m \"Ozen, Burak Erin\c{c} \c{C}etin, Kaan Eng\"ur, Elif Naz Demiry{\i}lmaz, Cagri Toraman

TL;DR
This paper conducts a broad ethical evaluation of 29 open-source large language models across multiple dimensions and languages, revealing strengths in safety and fairness but highlighting reliability issues, with implications for safer model development.
Contribution
It introduces a comprehensive, multi-dimensional ethical assessment framework applied to diverse open-source LLMs in multiple languages, filling gaps in prior narrow or limited evaluations.
Findings
Many models perform well in safety, fairness, and robustness.
Reliability remains a significant challenge across models.
Larger models tend to have better ethical performance.
Abstract
Generative large language models present significant potential but also raise critical ethical concerns, including issues of safety, fairness, robustness, and reliability. Most existing ethical studies, however, are limited by their narrow focus, a lack of language diversity, and an evaluation of a restricted set of models. To address these gaps, we present a broad ethical evaluation of 29 recent open-source LLMs using a novel dataset that assesses four key ethical dimensions: robustness, reliability, safety, and fairness. Our analysis includes both a high-resource language, English, and a low-resource language, Turkish, providing a comprehensive assessment and a guide for safer model development. Using an LLM-as-a-Judge methodology, our experimental results indicate that many open-source models demonstrate strong performance in safety, fairness, and robustness, while reliability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education · Topic Modeling
MethodsFocus
