For the Underrepresented in Gender Bias Research: Chinese Name Gender Prediction with Heterogeneous Graph Attention Network
Zihao Pan, Kai Peng, Shuai Ling, Haipeng Zhang

TL;DR
This paper introduces a novel Chinese Heterogeneous Graph Attention model that improves gender prediction accuracy from Chinese names by capturing character relationships and pronunciations, addressing limitations of existing tools and datasets.
Contribution
The paper presents the first Chinese Heterogeneous Graph Attention model for gender prediction, outperforming existing tools and providing a balanced dataset to support future gender bias research.
Findings
Our model surpasses current Chinese gender prediction tools.
The new dataset is more balanced and reliable for research.
The approach improves gender prediction accuracy significantly.
Abstract
Achieving gender equality is an important pillar for humankind's sustainable future. Pioneering data-driven gender bias research is based on large-scale public records such as scientific papers, patents, and company registrations, covering female researchers, inventors and entrepreneurs, and so on. Since gender information is often missing in relevant datasets, studies rely on tools to infer genders from names. However, available open-sourced Chinese gender-guessing tools are not yet suitable for scientific purposes, which may be partially responsible for female Chinese being underrepresented in mainstream gender bias research and affect their universality. Specifically, these tools focus on character-level information while overlooking the fact that the combinations of Chinese characters in multi-character names, as well as the components and pronunciations of characters, convey…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputational and Text Analysis Methods
