Correspondence Factor Analysis of Big Data Sets: A Case Study of 30 Million Words; and Contrasting Analytics using Apache Solr and Correspondence Analysis in R
Fionn Murtagh

TL;DR
This paper explores advanced analytical techniques for large text datasets, specifically cooking recipes, comparing correspondence analysis in R with Apache Solr, to uncover semantic relationships at multiple scales.
Contribution
It introduces a novel approach combining correspondence analysis with Solr for semantic mapping and querying of large text datasets, demonstrating new methods for multi-scale data analysis.
Findings
Effective semantic mapping using correspondence analysis
Successful integration of Solr with principal factor plane queries
Insights into term associations like singular and plural forms
Abstract
We consider a large number of text data sets. These are cooking recipes. Term distribution and other distributional properties of the data are investigated. Our aim is to look at various analytical approaches which allow for mining of information on both high and low detail scales. Metric space embedding is fundamental to our interest in the semantic properties of this data. We consider the projection of all data into analyses of aggregated versions of the data. We contrast that with projection of aggregated versions of the data into analyses of all the data. Analogously for the term set, we look at analysis of selected terms. We also look at inherent term associations such as between singular and plural. In addition to our use of Correspondence Analysis in R, for latent semantic space mapping, we also use Apache Solr. Setting up the Solr server and carrying out querying is described. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Mining Algorithms and Applications
