Normalization of zero-inflated data: An empirical analysis of a new indicator family
Robin Haunschild, Lutz Bornmann

TL;DR
This paper introduces a new indicator, MHq, for analyzing zero-inflated data, demonstrating its effectiveness in distinguishing quality levels where existing indicators fail.
Contribution
The paper proposes the MHq indicator, based on Mantel-Haenszel analysis, as a novel tool for assessing sparse data in comparison to existing indicators.
Findings
MHq effectively distinguishes between quality levels in most cases.
MNPC and EMNPC often fail to differentiate quality levels.
MHq outperforms existing indicators in zero-inflated data analysis.
Abstract
Recently, two new indicators (Equalized Mean-based Normalized Proportion Cited, EMNPC, and Mean-based Normalized Proportion Cited, MNPC) were proposed which are intended for sparse data. We propose a third indicator (Mantel-Haenszel quotient, MHq) belonging to the same indicator family. The MHq is based on the MH analysis - an established method for polling the data from multiple 2x2 contingency tables based on different subgroups. We test (using citations and assessments by peers) if the three indicators can distinguish between different quality levels as defined on the basis of the assessments by peers (convergent validity). We find that the indicator MHq is able to distinguish between the quality levels in most cases while MNPC and EMNPC are not.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSensory Analysis and Statistical Methods · Spatial and Panel Data Analysis · Data Management and Algorithms
