Classification and Powerlaws: The Logarithmic Transformation
Loet Leydesdorff, Stephen Bensman

TL;DR
This paper investigates the effects of logarithmic data transformation on citation distribution analysis, finding it reduces variance and may hinder classification, while powerlaws better fit distribution tails.
Contribution
It demonstrates that logarithmic transformation can be counterproductive for classification and highlights the suitability of powerlaws for modeling citation distribution tails.
Findings
Logarithmic transformation reduces variance for classification.
Powerlaws fit the tails of citation distributions well.
Log transformation may distort the underlying data structure.
Abstract
Logarithmic transformation of the data has been recommended by the literature in the case of highly skewed distributions such as those commonly found in information science. The purpose of the transformation is to make the data conform to the lognormal law of error for inferential purposes. How does this transformation affect the analysis? We factor analyze and visualize the citation environment of the Journal of the American Chemical Society (JACS) before and after a logarithmic transformation. The transformation strongly reduces the variance necessary for classificatory purposes and therefore is counterproductive to the purposes of the descriptive statistics. We recommend against the logarithmic transformation when sets cannot be defined unambiguously. The intellectual organization of the sciences is reflected in the curvilinear parts of the citation distributions, while negative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQualitative Comparative Analysis Research · Computational and Text Analysis Methods
