Privacy Artifact ConnecTor (PACT): Embedding Enterprise Artifacts for Compliance AI Agents
Chenhao Fang, Yanqing Peng, Rajeev Rao, Matt Sarmiento, Wendy Summer, Arya Pudota, Alex Goncalves, Jordi Mola, Herv\'e Robert

TL;DR
PACT is an embedding-based graph system that links diverse enterprise artifacts to facilitate large-scale privacy compliance and risk assessment, significantly improving artifact retrieval and matching accuracy.
Contribution
The paper introduces PACT, a novel embeddings-driven graph that connects heterogeneous enterprise artifacts for privacy compliance, utilizing a fine-tuned DRAGON embedding model.
Findings
Recall@1 improved from 18% to 53%.
Query match rate increased from 9.6% to 69.7%.
Hitrate@1 rose from 25.7% to 44.9%.
Abstract
Enterprise environments contain a heterogeneous, rapidly growing collection of internal artifacts related to code, data, and many different tools. Critical information for assessing privacy risk and ensuring regulatory compliance is often embedded across these varied resources, each with their own arcane discovery and extraction techniques. Therefore, large-scale privacy compliance in adherence to governmental regulations requires systems to discern the interconnected nature of diverse artifacts in a common, shared universe. We present Privacy Artifact ConnecT or (PACT), an embeddings-driven graph that links millions of artifacts spanning multiple artifact types generated by a variety of teams and projects. Powered by the state-of-the-art DRAGON embedding model, PACT uses a contrastive learning objective with light fine-tuning to link artifacts via their textual components such as raw…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
