Integrating gender inclusivity into large language models via instruction tuning

Alina Wr\'oblewska; Bartosz \.Zuk

arXiv:2508.18466·cs.CL·August 27, 2025

Integrating gender inclusivity into large language models via instruction tuning

Alina Wr\'oblewska, Bartosz \.Zuk

PDF

1 Datasets

TL;DR

This paper proposes a method to reduce gender bias in large language models for Polish by instruction tuning with a gender-inclusive dataset and guidelines, aiming to promote fairer language generation.

Contribution

It introduces a systematic instruction tuning approach using the IPIS dataset and explicit guidelines to embed gender inclusivity into multilingual and Polish-specific LLMs.

Findings

01

Reduced gender bias in model outputs

02

Effective integration of gender-inclusive guidelines

03

Applicable to multiple LLM architectures

Abstract

Imagine a language with masculine, feminine, and neuter grammatical genders, yet, due to historical and political conventions, masculine forms are predominantly used to refer to men, women and mixed-gender groups. This is the reality of contemporary Polish. A social consequence of this unfair linguistic system is that large language models (LLMs) trained on Polish texts inherit and reinforce this masculine bias, generating gender-imbalanced outputs. This study addresses this issue by tuning LLMs using the IPIS dataset, a collection of human-crafted gender-inclusive proofreading in Polish and Polish-to-English translation instructions. Grounded in a theoretical linguistic framework, we design a system prompt with explicit gender-inclusive guidelines for Polish. In our experiments, we IPIS-tune multilingual LLMs (Llama-8B, Mistral-7B and Mistral-Nemo) and Polish-specific LLMs (Bielik and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ipipan/ipis
dataset· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.