Small Business Classification By Name: Addressing Gender and Geographic   Origin Biases

Daniel Shapiro

arXiv:2012.10348·cs.LG·December 21, 2020

Small Business Classification By Name: Addressing Gender and Geographic Origin Biases

Daniel Shapiro

PDF

Open Access

TL;DR

This paper develops a model to classify small business types from names, identifies gender and geographic biases, and explores methods to reduce these biases, balancing fairness and accuracy.

Contribution

It introduces bias mitigation techniques like name masking and data augmentation to reduce gender and geographic biases in business classification.

Findings

01

Bias reduction by hiding given names decreased classification accuracy.

02

Gender-swapping data augmentation was less effective at bias reduction.

03

Model achieved 60.2% top-1 F1-score in business type prediction.

Abstract

Small business classification is a difficult and important task within many applications, including customer segmentation. Training on small business names introduces gender and geographic origin biases. A model for predicting one of 66 business types based only upon the business name was developed in this work (top-1 f1-score = 60.2%). Two approaches to removing the bias from this model are explored: replacing given names with a placeholder token, and augmenting the training data with gender-swapped examples. The results for these approaches is reported, and the bias in the model was reduced by hiding given names from the model. However, bias reduction was accomplished at the expense of classification performance (top-1 f1-score = 56.6%). Augmentation of the training data with gender-swapping samples proved less effective at bias reduction than the name hiding approach on the evaluated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Imbalanced Data Classification Techniques · Spam and Phishing Detection