HebID: Detecting Social Identities in Hebrew-language Political Text

Guy Mor-Lan; Naama Rivlin-Angert; Yael R. Kaplan; Tamir Sheafer; Shaul R. Shenhav

arXiv:2508.15483·cs.CL·February 24, 2026

HebID: Detecting Social Identities in Hebrew-language Political Text

Guy Mor-Lan, Naama Rivlin-Angert, Yael R. Kaplan, Tamir Sheafer, Shaul R. Shenhav

PDF

Open Access 1 Video

TL;DR

HebID is a new multilabel Hebrew dataset for detecting nuanced social identities in political texts, enabling analysis of identity expression and differences between elite discourse and public priorities.

Contribution

It introduces the first Hebrew corpus for social identity detection, benchmarks models including LLMs, and applies analysis to political discourse and public survey data.

Findings

01

Hebrew-tuned LLMs achieve macro-F1 of 0.74

02

Identifies gender and temporal variations in identity expression

03

Reveals differences between elite discourse and public identity priorities

Abstract

Political language is deeply intertwined with social identities. While social identities are often shaped by specific cultural contexts and expressed through particular uses of language, existing datasets for group and identity detection are predominantly English-centric, single-label and focus on coarse identity categories. We introduce HebID, the first multilabel Hebrew corpus for social identity detection: 5,536 sentences from Israeli politicians' Facebook posts (Dec 2018-Apr 2021), manually annotated for twelve nuanced social identities (e.g. Rightist, Ultra-Orthodox, Socially-oriented) grounded by survey data. We benchmark multilabel and single-label encoders alongside 2B-9B-parameter generative LLMs, finding that Hebrew-tuned LLMs provide the best results (macro- $F_{1}$ = 0.74). We apply our classifier to politicians' Facebook posts and parliamentary speeches, evaluating differences…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

HebID: Detecting Social Identities in Hebrew-language Political Text· underline

Taxonomy

TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Authorship Attribution and Profiling