balance -- a Python package for balancing biased data samples
Tal Sarig, Tal Galili, Roee Eilat

TL;DR
The paper introduces 'balance', an open-source Python package that helps researchers analyze and correct bias in survey data samples to improve the accuracy of insights and machine learning models.
Contribution
It presents a new Python package with a straightforward workflow for bias analysis and adjustment in survey data, including bias understanding, correction, and evaluation.
Findings
Provides a simple API for bias correction
Includes methods for bias assessment and adjustment
Enhances the reliability of survey-based insights
Abstract
Surveys are an important research tool, providing unique measurements on subjective experiences such as sentiment and opinions that cannot be measured by other means. However, because survey data is collected from a self-selected group of participants, directly inferring insights from it to a population of interest, or training ML models on such data, can lead to erroneous estimates or under-performing models. In this paper we present balance, an open-source Python package by Meta, offering a simple workflow for analyzing and adjusting biased data samples with respect to a population of interest. The balance workflow includes three steps: understanding the initial bias in the data relative to a target we would like to infer, adjusting the data to correct for the bias by producing weights for each unit in the sample based on propensity scores, and evaluating the final biases and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Computational and Text Analysis Methods · Data Analysis with R
