An Empirical Study on the Characteristics of Bias upon Context Length   Variation for Bangla

Jayanta Sadhu; Ayan Antik Khan; Abhik Bhattacharjee; Rifat Shahriyar

arXiv:2406.17375·cs.CL·June 26, 2024

An Empirical Study on the Characteristics of Bias upon Context Length Variation for Bangla

Jayanta Sadhu, Ayan Antik Khan, Abhik Bhattacharjee, Rifat Shahriyar

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This study investigates how context length influences bias measurement in Bangla language models, creating a dataset and adapting bias assessment methods for this low-resource language, revealing significant dependency on context length.

Contribution

It introduces a Bangla-specific bias dataset, adapts existing bias measurement methods for Bangla, and analyzes the effect of context length on bias metrics.

Findings

01

Bias metrics depend on context length

02

Created a Bangla bias dataset

03

Resources are publicly available

Abstract

Pretrained language models inherently exhibit various social biases, prompting a crucial examination of their social impact across various linguistic contexts due to their widespread usage. Previous studies have provided numerous methods for intrinsic bias measurements, predominantly focused on high-resource languages. In this work, we aim to extend these investigations to Bangla, a low-resource language. Specifically, in this study, we (1) create a dataset for intrinsic gender bias measurement in Bangla, (2) discuss necessary adaptations to apply existing bias measurement methods for Bangla, and (3) examine the impact of context length variation on bias measurement, a factor that has been overlooked in previous studies. Through our experiments, we demonstrate a clear dependency of bias metrics on context length, highlighting the need for nuanced considerations in Bangla bias analysis.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csebuetnlp/BanglaContextualBias
noneOfficial

Datasets

csebuetnlp/BanglaContextualBias
dataset· 37 dl
37 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Computational and Text Analysis Methods · ICT in Developing Communities