Rethnicity: Predicting Ethnicity from Names
Fangzhou Xie

TL;DR
This paper introduces an R package called 'rethnicity' that predicts ethnicity from names using a Bidirectional LSTM model trained on Florida Voter Registration data, emphasizing minority group accuracy and integration with R.
Contribution
The paper presents a new R package for ethnicity prediction from names, utilizing deep learning and addressing minority group accuracy, with integration into R and performance comparison.
Findings
High accuracy in ethnicity prediction, especially for minority groups
Effective integration of deep learning model with R via Rcpp
Competitive performance compared to existing solutions
Abstract
In this study, a new R package, \texttt{rethnicity} is provided for predicting ethnicity based on names. The Bidirectional LSTM and Florida Voter Registration were used as the model and training data, respectively. Special care was given for the accuracy of minority groups, by adjusting the imbalance in the dataset. The models were trained and exported to C++ and then integrated with R using Rcpp. Additionally, the availability, accuracy, and performance of the package were compared with other solutions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNames, Identity, and Discrimination Research · Authorship Attribution and Profiling · Computational and Text Analysis Methods
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Distance to Modelled Embedding
