Designing Feature Vector Representations: A case study from Chemistry
Signe Sidwall Thygesen, Daniel Witschard, Andreas Kerren, Talha Bin, Masood, and Ingrid Hotz

TL;DR
This paper investigates various feature vector representations for chemical data, analyzing their similarities and differences to improve data reduction, comparison, and clustering in chemical multivariate ensemble analysis.
Contribution
It provides a comparative analysis of different feature representations, highlighting their similarities, differences, and potential for future development in chemical data analysis.
Findings
Partial confirmation of expected behavior in feature representations
Surprising observations about distance distributions and clustering tendencies
Insights for future development of chemical feature vectors
Abstract
We present a case study investigating feature descriptors in the context of the analysis of chemical multivariate ensemble data. The data of each ensemble member consists of three parts: the design parameters for each ensemble member, field data resulting from the numerical simulations, and physical properties of the molecules. Since feature-based methods have the potential to reduce the data complexity and facilitate comparison and clustering, we are focusing on such methods. However, there are many options to design the feature vector representation and there is no obvious preference. To get a better understanding of the different representations, we analyze their similarities and differences. Thereby, we focus on three characteristics derived from the representations: the distribution of pairwise distances, the clustering tendency, and the rank-order of the pairwise distances. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Advanced Chemical Sensor Technologies · Spectroscopy and Chemometric Analyses
