Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information
Sarwan Ali

TL;DR
This paper introduces a comprehensive method to evaluate the capacity of embeddings to preserve structural and contextual information, combining extrinsic and neighborhood analysis with optimization for objective assessment.
Contribution
It proposes a novel, data-driven evaluation framework for measuring the representation capacity of embeddings, integrating multiple metrics and optimization techniques.
Findings
Effective assessment of embedding quality using combined metrics
Optimization improves selection of the best embedding configurations
Evaluation on biological datasets demonstrates the method's utility
Abstract
Effective representation of data is crucial in various machine learning tasks, as it captures the underlying structure and context of the data. Embeddings have emerged as a powerful technique for data representation, but evaluating their quality and capacity to preserve structural and contextual information remains a challenge. In this paper, we address this need by proposing a method to measure the \textit{representation capacity} of embeddings. The motivation behind this work stems from the importance of understanding the strengths and limitations of embeddings, enabling researchers and practitioners to make informed decisions in selecting appropriate embedding models for their specific applications. By combining extrinsic evaluation methods, such as classification and clustering, with t-SNE-based neighborhood analysis, such as neighborhood agreement and trustworthiness, we provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics
