Gendec: A Machine Learning-based Framework for Gender Detection from Japanese Names
Duong Tien Pham, Luan Thanh Nguyen

TL;DR
Gendec is a novel machine learning framework designed to accurately detect gender from Japanese names, utilizing a new comprehensive dataset and multiple modeling approaches for practical cultural and linguistic insights.
Contribution
This work introduces a new Japanese name dataset and a versatile framework combining traditional and transfer learning methods for gender detection.
Findings
Effective gender prediction accuracy demonstrated
Diverse approaches improve model robustness
Potential applications in cultural and linguistic analysis
Abstract
Every human has their own name, a fundamental aspect of their identity and cultural heritage. The name often conveys a wealth of information, including details about an individual's background, ethnicity, and, especially, their gender. By detecting gender through the analysis of names, researchers can unlock valuable insights into linguistic patterns and cultural norms, which can be applied to practical applications. Hence, this work presents a novel dataset for Japanese name gender detection comprising 64,139 full names in romaji, hiragana, and kanji forms, along with their biological genders. Moreover, we propose Gendec, a framework for gender detection from Japanese names that leverages diverse approaches, including traditional machine learning techniques or cutting-edge transfer learning models, to predict the gender associated with Japanese names accurately. Through a thorough…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNames, Identity, and Discrimination Research · Authorship Attribution and Profiling · Translation Studies and Practices
