SCHENO: Measuring Schema vs. Noise in Graphs
Justus Isaiah Hibshman, Adnan Hoq, Tim Weninger

TL;DR
SCHENO is a new metric for evaluating how well a schema-noise decomposition captures the core pattern in noisy graph data, aiding in pattern discovery and assessment of existing algorithms.
Contribution
Introduces SCHENO, a novel evaluation metric for schema-noise decomposition in graphs, and demonstrates its effectiveness in pattern discovery and algorithm assessment.
Findings
SCHENO effectively measures schema quality and noise level.
Using SCHENO as a fitness function uncovers diverse graph patterns.
Existing algorithms often produce suboptimal data representations.
Abstract
Real-world data is typically a noisy manifestation of a core pattern (schema), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (i.e. decomposing) the data into schema and noise. We introduce SCHENO, a principled evaluation metric for the goodness of a schema-noise decomposition of a graph. SCHENO captures how schematic the schema is, how noisy the noise is, and how well the combination of the two represent the original graph data. We visually demonstrate what this metric prioritizes in small graphs, then show that if SCHENO is used as the fitness function for a simple optimization strategy, we can uncover a wide variety of patterns. Finally, we evaluate several well-known graph mining algorithms with this metric; we find that although they produce patterns, those patterns are not always the best representation of the input data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Complex Network Analysis Techniques
