Co-occurrence Matrices and their Applications in Information Science: Extending ACA to the Web Environment
Loet Leydesdorff, Liwen Vaughan

TL;DR
This paper clarifies the proper statistical analysis of co-occurrence matrices in information science, distinguishes between matrix types, and extends their application to web data using Google Scholar and social network visualization tools.
Contribution
It differentiates symmetrical and asymmetrical matrices, recommends suitable statistical techniques, and demonstrates their application to web data with new visualization methods.
Findings
Proper statistical methods depend on matrix type.
Co-occurrence matrices can be effectively extended to web data.
Visualization with Pajek reveals new insights into web-based co-occurrence data.
Abstract
Co-occurrence matrices, such as co-citation, co-word, and co-link matrices, have been used widely in the information sciences. However, confusion and controversy have hindered the proper statistical analysis of this data. The underlying problem, in our opinion, involved understanding the nature of various types of matrices. This paper discusses the difference between a symmetrical co-citation matrix and an asymmetrical citation matrix as well as the appropriate statistical techniques that can be applied to each of these matrices, respectively. Similarity measures (like the Pearson correlation coefficient or the cosine) should not be applied to the symmetrical co-citation matrix, but can be applied to the asymmetrical citation matrix to derive the proximity matrix. The argument is illustrated with examples. The study then extends the application of co-occurrence matrices to the Web…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques
