Explaining Categorical Feature Interactions Using Graph Covariance and LLMs
Cencheng Shen, Darren Edge, Jonathan Larson, Carey E. Priebe

TL;DR
This paper introduces a scalable method combining graph covariance and large language models to analyze and explain categorical feature interactions over time in large datasets, exemplified by human trafficking data.
Contribution
It proposes a novel approach that uses graph covariance to identify significant feature interactions and leverages LLMs for generating explanations, enhancing interpretability of complex temporal categorical data.
Findings
Effective identification of significant feature pairs and their temporal dependence changes.
Demonstrated scalability and accuracy through simulations.
Uncovered meaningful insights in the CTDC human trafficking dataset.
Abstract
Modern datasets often consist of numerous samples with abundant features and associated timestamps. Analyzing such datasets to uncover underlying events typically requires complex statistical methods and substantial domain expertise. A notable example, and the primary data focus of this paper, is the global synthetic dataset from the Counter Trafficking Data Collaborative (CTDC) -- a global hub of human trafficking data containing over 200,000 anonymized records spanning from 2002 to 2022, with numerous categorical features for each record. In this paper, we propose a fast and scalable method for analyzing and extracting significant categorical feature interactions, and querying large language models (LLMs) to generate data-driven insights that explain these interactions. Our approach begins with a binarization step for categorical features using one-hot encoding, followed by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Biomedical Text Mining and Ontologies · Advanced Graph Neural Networks
MethodsFocus
